Adi Levin's Blog for programmers

August 30, 2009

Constrained Parallelism

Filed under: Multithreading — Adi Levin @ 9:00 pm
Tags: , ,

This pattern is useful for parallelising complex algorithms, since it enables the programmer to exploit all possible parallelisms in the algorithm.

As input to the pattern, we have N tasks. For each pair of tasks i=0,1,…,N-1 and j=0,1,…,N-1 we are given two pieces of information: Must_Preceed(i,j) and Can_Run_In_Parallel(i,j). When Must_Preceed(i,j)=true, this means that task #j is not allowed to execute before task #i is completed. When Can_Run_In_Parallel(i,j)=true, it means that tasks i and j can run simultaneously in different threads. When it is false, we are not allowed to run them in parallel. The user also defines the number NTHREADS that says how many tasks are allowed to run in parallel at any given time. For scalability, one good policy is to set this number to the number of processors available on the machine.

When the user starts the algorithm, we first run tasks that have no required predecessors, and that are allowed to run in parallel to each-other. Whenever a task ends, we find another task that hasn’t been invoked yet, and that can be invoked given the constraints.

I recommend implementing a generic class that performs this pattern. It forces the programmer to think systematically about his algorithm and analyze its parallel structure.

Implementation using a master thread

The implementation requires NTHREADS worker threads and one extra thread whose job is to allocate tasks to worker threads and to be notified when tasks are finished. At the beginning, we calculate and store, for each task #i, the number NPRED(i) of required predecessors. If NPRED(i)=0, the task is allowed to run, as long as there are no running tasks j for which Can_Run_In_Parallel(i,j)=false.

In the master thread there should be a message loop (see my post on messages and message queues). Whenever a task finishes its work in a worker thread, it sends a message to the master thread using PostThreadMessage. The message says which task has just ended – for example task #i. The master thread recieves the message and decrements NPRED(j) for each j such that Must_Preceed(i,j)=true. After this update stage, it looks for tasks that can be invoked. We simply go over the list of tasks that haven’t been invoked yet, and look for tasks i for which NPRED(i)=0, and that are allowed to run in parallel to the tasks that are already running.

When parallelizing and algorithm using this pattern, it is important not to break the algorithm into too many tasks, because the fact that there is a master thread means that there is management overhead. Notice that the worst-case time complexity is at least O(N^2), in an efficient implementation, because we need to compute Must_Preceed(i,j) and Can_Run_In_Parallel(i,j) for all i and all j. However, if N is at most a few thousands, and the tasks are not trivial, this overhead is negligible.

“Focus-on-task” functionality

“Focus-on-task” is an added functionality that works nicely with this pattern. While the algorithm is running, the user may wish to give highest priority to a specific task #i. In such case, we should invoke only task #i and the tasks that are required for it to run (i.e. the tree of predecessors). After task #i is finished, the algorithm should return to its normal behavior and invoke all tasks until completion.

The implementation of this functionality is a relatively simple extension to the above: When we want to focus on a specific task, we send the master thread a special message (using PostThreadMessage), that instructs it to focus on task #i. Then, the master thread computes all predecessors of task #i, and continues as usual, except that the only tasks that belong to that tree are allowed to be invoked – until task #i is invoked (in which case we return to normal mode).

Implementation without a master thread (only a pool of worker threads is needed)

Suppose we only have the Must_Preceed constraint (i.e, we’re dropping Can_Run_In_Parallel). We can implement this pattern without a master thread. You’ll need a thread pool in which you can spawn tasks dynamically (unlike my thread_team example, in which you can only spawn a predefined collection of tasks).

You start by spawning all tasks that have no predecessors. There are now a number of tasks running in worker threads. Each task needs to be wrapped by a function that performs some the following operation after the task is complete: On the worker thread, when a task is finished, we decrement NPRED(j) for each j such that Must_Preceed(i,j)=true. If NPRED(j) reaches zero, we spawn task #j.

This is a simpler approach, but it doesn’t allow the “Focus-on-task” functionality, and doesn’t accept constraints of type Can_Run_In_Parallel.

Advertisements

August 29, 2009

Multithreading Patterns #4 – pipeline

Filed under: Multithreading — Adi Levin @ 9:01 pm
Tags: ,

The Pipeline pattern is suitable when there is a collection of items, and each item needs to go through the same sequence of filters. Each filter can run in parallel to other filters, but can only process one item at a time.

For example, imagine a military application that decrypts enemy messages and translates them to the language of the “good guys”. The input is a stream of messages. Each message is stored in a file, then decrypted, and the decrypted message is translated. Suppose that due to hardward limitations, the decryption unit can only handle one message at a time (e.g. if the decryption is done by special decryption hardware that has a single input and single output). Even though we can’t fully parallelize the work, we want to take advantage of the fact that saving messages to disk can be done in parallel to decrypting or translating previous messages.

One way to exploit this parallism is to allocate a thread for each of these actions. The first thread will intercept incoming messages and save them to disk. The second thread will be in charge of controlling the decryption hardware. The third thread will be responsible for translating decrypted messages from the enemy’s language.

More generally, a simple implementation of a pipeline goes as follows: There are N threads (N being the number of filters). Each thread runs a function that simply sleeps in alertable state, waiting to invoke incoming APC‘s (see the function pipeline_thread_func below). The filters are implemented as functions that get as input the pointer to the item to which they should apply. In each filter, the last instruction should be to place an APC to the thread responsible for the next filter in the pipeline, passing the item pointer as input (see the functions filter1 and filter2 below).

static void pipeline_thread_func(HANDLE exit_event)

{ // exit_event is used as a way to end this thread. When it is signaled, the thread will end.

while (true) {  if (WaitForSingleObjectEx(exit_event,INFINITE,TRUE)==WAIT_OBJECT_0)  return; }

}

static void filter1(void* item)

{

… do stuff on *item ….

QueueUserAPC(thread_handle2,filter2,item);

}

static void filter2(void* item)

{

… do stuff on *item ….

QueueUserAPC(thread_handle3,filter3,item);

}

For more reading material, I refer you to the documentation of the class pipeline  in Intel TBB (Thread Building Blocks).

August 1, 2009

Multithreading Patterns #3 – Call In Main Thread

Motivation

This pattern is very useful when using the “background thread” pattern. Typically, the main thread is responsible for all UI operations (handling events and drawing controls and graphics).  When a background computation is performed in a different thread, you sometimes need to update controls (such as text controls) or data that is used for drawing (such as geometric data, surfaces and images), when certain stages of the computation are completed. For example, an application that renders a movie scene may want to update the movie and at the same time enable to watch the part of the scene that was created so far.

An elegant way to achieve this is to perform all updates of the document data in the main thread only. The heavy computation is done in the background, and only when a certain stage is reached, we call a function in the main thread that makes the update of the document data.

Implementation

Typically, the main thread has a message loop that spends most of its time waiting for user input, and is associated with the main window of the application. So, in order to run a certain function in the main thread, we need to send a message to the application main window. The parameters of the message LPARAM and WPARAM define what function should invoked, and what parameter it should take as input.

The window procedure that handles messages should recognize these special messages by their message identifier (which can be WM_APP + x), and call the needed function. This should be encapsulated nicely into a class, so that anyone who wants to place a call in the main thread will only need to give the function pointer and the parameter (a pointer as well).

Two types of calls – SendMessage & PostMessage

Since we perform the call-in-main-thread via messages, we need to choose between sending and posting a message. Sending a message from the background thread means that you halt the background computation until the function is called in the main thread and is completed. Posting a message means that you don’t halt the computation. Depending on the circumstances of your application, you should decide which of them to use (see my post on messages and message loop for the issues related to SendMessage and PostMessage).

Calls made by SendMessage are simple to implement, but they can cause a deadlock if you’re not careful (because the background thread waits for the main thread, while the main thread may be waiting for the background thread). Because of that, it is often safer to use PostMessage. However, when you use PostMessage you need to make sure that the parameter you send for the function call remains valid at the time that the call will actually be performed, which can be far in the future (when the main thread is ready), long after the background computation has completed.

One way to deal with this problem is to make a special call at certain places in the code that causes all pending function calls to be either ignored or performed – depending on what your application needs. One way to support this functionality is to keep a table of all posted calls (function pointer and parameter), so that at any moment we have access to the list of function calls that are waiting to be executed.

Multithreading Patterns #2 – background thread

Filed under: Multithreading — Adi Levin @ 9:17 am
Tags: , ,

Typically, the main thread of an application performs all UI operations, such as responding to mouse events, updating windows and drawing graphics. If a long computation is performed in the main thread, the application is non-responsive until the computation is over. The purpose of a background thread is to enable responsiveness. The application will keep responding to user input while a computation is performed in the background.

In addition to running the long computation, a background thread should:

1. Report progress, i.e. report what percentage of the computation is done, or estimate the time left to completion.

2. Enable to abort at any moment.

3. Signal an event or send a message when the computation is completed.

Implementation of a background thread

To start the operation, create a thread using CreateThread. Inside the thread function, you need to constantly check the value of a variable that is used for abort requests. When the value of that variable is non-zero, you should abort the computation. When the computation is done, signal an event, or send a message to the main application window.

It is best to encapsulate all of that in a base class that has a pure-virtual function “run()” that should be overriden by any instance of background thread.

Communicating with the main thread

It is nice to report the progress of the computation to the user by a visual progress bar. A dialog in the main thread should have a timer that constantly update the position of the progress bar according to a variable which is constantly updated by the background thread function. The value of the progress variable should be zero at the beginning, and gradually reach 100 at the end of the computation.

For thread-safety, event handlers must be aware of the fact that the background thread is running. Each event handler should fall in one of these categories:

1. The operation can run safely while the background thread is running.

2. The operation is disabled while the background thread is running.

3. The operation is enabled, but before it begins, we must first abort the background thread.

In case of abort, we politely ask the background thread to abort, by setting the value of a certain flag, and then we wait until the background thread is completed. Do not force a thread to terminate using TerminateThread. When a thread is terminated using TerminateThread, the stack is not freed (typically 1MB), and critical sections that are owned by the thread are not released – which is a cause for deadlocks. For example, heap allocation uses a global critical section, so terminating a thread that does dynamic allocation can cause a deadlock. The worst thing about it is that these deadlocks happen at a very small probability, so it is hard to catch and debug them. In short – the thread should always end by returning from the thread function.

During the background computation, or at the end of it, sometimes we need to update data that should only be accessed by the main thread. For example, the main thread is responsible for drawing, so when we need to update data that is accessed by drawing routines, we have to use explicit synchronization mechanisms (e.g. mutex), or we must update the data on the main thread, which guarantees that the update will not occur while drawing. For details on how to do that, see the pattern “call in main thread“.

Create a free website or blog at WordPress.com.