View RSS Feed

Bearcat

Exploring HPC Programming: OpenMP

Rating: 2 votes, 4.50 average.
by on November 20th, 2008 at 12:55 PM (2567 Views)
Today, we'll have a quick look at OpenMP. OpenMP is a set of programming APIs, and compiler pragmas that support multi-platform, shared memory multiprocessing programming in C/C++ and Fortran. The interesting thing about OpenMP, is that it is a very nice simple way to split loops (for, do) into tasks for multi-threading. Our program has a number of "for" loops in it, so that is the obvious avenue to explore for our particular program.

Anyone can get a quick overview of OpenMP from Wikipedia here: OpenMP - Wikipedia, the free encyclopedia and the page contains a number of other references for additional information. There is also the OpenMP site: OpenMP.org for information. Read up, it's an interesting topic.

First off, I'll profess that I'm not an OpenMP expert. I've learned enough by reading, trying, and experimenting that I can use OpenMP in it's basic form. To use OpenMP, you need a compiler that supports it. GCC 4.2 and up support OpenMP. Also Redhat has back ported OpenMP into the compiler supplied with Redhat 5.2. It is also the same compiler in CentOS 5.2. To check, you must have the omp.h include file, and the libgomp library.

The really nice thing about OpenMP, is that it is much less intrusive on your program then converting the program to using threads.

So let's get going.

In previous articles, I changed a baseline program to multi-threaded giving a couple of options for attacking the problem. For the OpenMP example, I don't need the multi-threaded example, so I went back to the baseline example as a starting point. The multi-threaded examples executed in just under 5 minutes, so that will be a target. I don't expect to reach the target, but getting close would be nice, and convince me that OpenMP is a viable way of doing things.

The first thing I did to the baseline example was replace the rand function (because it's not multi-threaded), with the distribution function I created for the multi-threaded example. Now we have a new starting point for OpenMP.

The first step is to add the include file (who would have guessed), at the beginning of the source file:

Code:
#include <omp.h>
That's really all you need to do for preparation, really simple. I wanted to assure that OpenMP would start 4 threads on my Quad Core machine, so I added the following line in the "main" function of the program.

Code:
omp_set_num_threads(4);
That's it. Now to play with the pragmas.

We have a "for" loop in the "blackscholes" function, so we'll try that first. After all it loops 1,000,000 times for each portfolio item, which there are 1024 of. This is the inner most loop that just performs the calculations. To unroll a "for" loop in OpenMP is quite simple, just place a pragma right before the "for" statement.

Code:
   #pragma omp parallel for private(index)
   for (index = 0; index < experiments; index++)
I only put "index" in the private section, because I declared "index" outside the "for" loop. I'm not sure if I needed to do this. The private section of the pragma is to tell OpenMP, which variables are private to each thread unrolled from the "for" loop, such that OpenMP doesn't have to block and handle access to the variable from multiple threads.

Everyone should read about variable scope in OpenMP, as it makes a big difference. I've also found that trying different combinations helps our understanding as well, and it's really simple, change the pragma line, compile, test, rinse, repeat. easy huh?

So with these changes, and changes to the Makefile to compile, and include the correct libraries, how does this fair?

Here are the results of the run:

[leo@bearcat1 OpenMP]$ time ./openmpprice
=== Option Portfolio Calculations (OpenMP Test) ==========
Portfolio size : 1024
Experiments run per item : 1000000
Average Call Price : 36.560187
Average Put Price : 1.669589

real 6m2.281s
user 23m49.077s
sys 0m0.302s
Not Bad, 6 minutes and a couple of seconds, compared to the multi-threaded program at just under 5 minutes. Certainly a lot better then the original 19 minutes of the single threaded program. I'd say a GREAT result for a small effort.

The other big loop is the "for" loop for the number of times the calculations are performed, in the "portfolio" function. To test this loop, I removed the pragma from the previous test run, and put a pragma on the "portfolio" loop, like this:

Code:
  #pragma omp parallel for private(i,j,pnum,snum,ynum)
   for (i = 0; i < num_options; i++)
This is the outer most loop, which also includes the initialization, and the "blackscholes" function call to performa the calculations. Here's the result:

[leo@bearcat1 OpenMP]$ time ./openmpprice
=== Option Portfolio Calculations (OpenMP Test) ==========
Portfolio size : 1024
Experiments run per item : 1000000
Average Call Price : 36.558490
Average Put Price : 1.669706

real 5m25.763s
user 21m14.538s
sys 0m0.239s
Better, because I included more of the processing under OpenMP control. So 5 minutes and 25 seconds is not bad, and very close to the performance I got from hand coding my own threads.

I'm impressed! Are you? A small effort, not intrusive, BIG gains.

Two additional lines, and a pragma placed in the right spot, and OpenMP does a bang up job of multi-processing my program. I should note that during these 2 execution runs, my processors were pegged at 100% during the whole run, so the conversion of the program seems very efficient.

Here's the program:

OpenMP.zip

OpenMP allows multiple pragmas, so if you have a program that has separate sections of calculations, then you can pragma the different sections to help speed them up. If you have older programs similar in structure such as this one, "for" loops, then OpenMP will certainly speed things up. As you learn more about OpenMP, I'm sure you'll find other uses for it, to help speed up other sections of your program.

Have Fun,

Leo Stutzmann

Updated November 20th, 2008 at 12:57 PM by Bearcat (Added attachment)

Categories
General

Comments

Trackbacks

Total Trackbacks 0
Trackback URL: