Experimenting with a new technology always takes time. If the technology is plug-and-play (like an Ethernet switch) it is often easy to install and test it. If, however, the technology is more hands-on (like a new programming language), then a more detailed analysis is warranted. Indeed, a new programming paradigm often means that there are two tasks involved in the evaluation. The first is finding the right hardware/software platform and installing the software. The second task is actually sitting down and trying out some new things. Obviously the second task requires the first task to be complete before you can "play with the software."
In some cases downloading a new software tool is as simple as installing an RPM or similar package on your Linux box. In other cases, if the software is designed for multi-core or GP-GPU processing then it is important to make sure you have the right hardware. In still other cases, specific versions of Linux or libraries may be required in order for the software to work. In this article, we will be discussing how to install the nVidia CUDA software suite on standard desktop hardware. For some this step can be challenging because it requires assembling and installing both hardware and software, so this article will help with your first steps. In terms of cost, the nVidia software is freely available as are some of the supported versions of Linux. You may have to scrounge a bit to find some hardware, but the CUDA toolkit works on many existing nVidia graphics cards.
Once you get everything working, you will be able to run programs like the mandelbrot example. Using this program allows you to compare the difference between the GPU and the CPU. Plus you get to look at pretty pictures like those in Figure One below. While this level of experimentation does not utilize a cluster, it is a great (and inexpensive) way to learn about GP-GPU computing, which is becoming popular in many HPC circles.

Hardware Setup
One of the advantages with the nVidia CUDA software is that it will work on many older video cards. For testing purposes this is a big advantage as there is no need to buy a new $250+ video card to try CUDA programming. The nVidia website states that CUDA will work with GeForce 8, 9, 100, 200-series GPUs with a minimum of 256MB of local graphics memory. To check that your video card will work, go to the CUDA GPUs web page and check for you model. For this article, I am using a rather old GeForce 8500 GT. The specifications are given in Table One below.
| CUDA Cores | 16 |
| Core Clock (MHz) | 450 |
| Shader Clock (MHz) | 900 |
| Memory Clock (MHz) | 400 |
| Memory Amount | 512MB DDR2 |
| Memory Interface | 128-bit |
| Memory Bandwidth (GB/sec) | 12.8 |
| Texture Fill Rate (billion/sec) | 3.6 |
Table One: GeForce 8500 GT specification
As you can see the specifications are a bit lacking by todays standards. In particular, the 8500 GT only has 16 CUDA cores. While this is certainly not enough for production work, it is enough to test and play with the CUDA toolkit. Indeed, the scalable nature of CUDA applications means that I can develop on my existing hardware and then move to a much large number of CUDA cores with no re-programming
My host processor is an Intel Core2 Duo Model 6400 (2.1 GHz) with 2 GB or memory and a GigByte GA 945GM motherboard. Again, nothing spectacular by todays hardware standards, but that is the point. There is no need to buy new hardware to test CUDA software.
Software Setup
I had previously installed Fedora 8 on the test hardware and the new version of CUDA (Version 2.3) requires Fedora 10. Not being one to make extra work for myself, I decided to upgrade my Fedora version and keep everything within the nVidia recommend environment. CUDA 2.3 also supports Red Hat Enterprise Linux 4.3-4.7, or 5.0-5.3, and SUSE Enterprise Desktop 10-SP2 or 11, or Open SUSE 11, or Ubuntu 8.10 or 9.04
A note about upgrading Fedora. In the past, I have had little success upgrading Fedora versions. I usually end up reinstalling everything. I am pleased to say that my upgrade of Fedora 8 to Fedora 10 occurred without issue (of course upgrading/installing 1600+ packages over the web took a little time). Nice job, Fedora team
Before we get to the installation, I want to provide a little background on CUDA. First, CUDA stands for "Compute Unified Device Architecture." The CUDA designers quite wisely did not invent a completely new language and based CUDA on ANSI C. They intentionally designed a minimal set of extensions to standard C so that users could experiment in a familiar environment. There is support for Fortran which I'll mention at the end of this article.
Another advantage of CUDA is that it allows the user to take small increasing more advanced steps. Users can take an existing C program and add CUDA code incrementally. Progressing in this fashion users can usually see some benefit for a small amount of work so that the user does not have to rip their program apart and rebuild it in order to see if CUDA will provide some increased speed. This feature coupled with a low cost development environment that can be easily scaled to better hardware has made CUDA very popular.
Downloads and System Preparation
The first step is to download the CUDA software. The CUDA Toolkit is available from the nVidia download page. For the purposes of this article you will need to download the Linux packages listed below. (Note if not using Fedora, select the one for your supported OS environment). You will also need to select a 32 or 64-bit version depending on your host system.
- CUDA Toolkit - cudatoolkit_2.3_linux_64_fedora10.run
- CUDA Getting Started Guide - CUDA_Getting_Started_2.3_Linux.pdf
- Current Video Driver - NVIDIA-Linux-x86_64-190.53-pkg2.run
- CUDA SDK - cudasdk_2.3_linux.run
The first step recommend by nVidia is to verify your installation is one of the supported versions. They recommend running:
Code:
uname -i && cat /etc/*release
The next step is installing the most recent nVidia driver. If you have already done this you can skip this step. If you do not have the nVidia driver installed or if you have an old version, then you should install/upgrade the latest driver as it is required for newer versions of CUDA. To install the driver, simply enter:
Code:
# chmod u+x NVIDIA-Linux-x86_64-190.53-pkg2.run # ./NVIDIA-Linux-x86_64-190.53-pkg2.run
Code:
# dmesg|grep nvidia nvidia: module license 'NVIDIA' taints kernel. nvidia 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 nvidia 0000:01:00.0: setting latency timer to 64
Installing CUDA Toolkit and SDK
If everything has worked thus far, it is time to install the CUDA software. The installation is very simple. Enter the following commands, and when asked for the installation location, use the default (/usr/local/cuda).
Code:
# chmod u+x cudatoolkit_2.3_linux_64_fedora10.run #./cudatoolkit_2.3_linux_64_fedora10.run
Code:
export PATH=$PATH:/usr/local/cuda/bin
Code:
/usr/local/cuda/lib64
Install the SDK
In order to test our installation, we need to install the CUDA SDK. As with the other packages, simply enter:
Code:
# chmod u+x cudasdk_2.3_linux.run # ./cudasdk_2.3_linux.run
Code:
$ mkdir CUDA $ cd CUDA $ cp -rp /usr/local/NVIDIA_GPU_Computing_SDK .
On my installation, I got the error /usr/bin/ld: cannot find -lglut, which is odd because the freeglut RPM was installed. On further inspection I found that the following fixed the problem (do this as root):
Code:
# ln -s /usr/lib64/libglut.so.3.8.0 /usr/lib64/libglut.so
Code:
$cd /bin/linux/release
Code:
$ ./deviceQuery CUDA Device Query (Runtime API) version (CUDART static linking) There is 1 device supporting CUDA Device 0: "GeForce 8500 GT" CUDA Driver Version: 2.30 CUDA Runtime Version: 2.30 CUDA Capability Major revision number: 1 CUDA Capability Minor revision number: 1 Total amount of global memory: 536150016 bytes Number of multiprocessors: 2 Number of cores: 16 Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 0.92 GHz Concurrent copy and execution: Yes Run time limit on kernels: Yes Integrated: No Support host page-locked memory mapping: No Compute mode: Default (multiple host threads can use this device simultaneously) Test PASSED Press ENTER to exit...
If everything has worked thus far, you can run the CUDA example programs to further your confidence in the installation. An example of the nbody code is shown in Figure Two below. The examples provide many types of applications and source code from which you can learn more about CUDA. Your next step is to download the CUDA Programming Guide and work on your own killer application. You can also find more examples (some freely available) at the CUDA Zone.

Figure Two: Example nbody program in action
CUDA and Clusters
At this point you should be able to explore CUDA for your own applications. If you use Fortran, you will be pleased to know that The Portland Group (PGI) and nVidia have teamed up to offer PGI CUDA Fortran. PGI has a 15-day trial offer that will let you play a bit with the tools
One final question you may ask is, "How do I use CUDA on my HPC cluster?" At this point, if your cluster nodes do not have access to nVidia hardware, then you obviously cannot use CUDA programs. Many installations are considering adding "CUDA nodes" equipped with nVidia Tesla cards, which can be used to run CUDA applications. In one sense, the use of GP-GPUs is like adding a low cost array processor to your nodes. You can also consider a stand-alone CUDA workstation that includes multiple Tesla GPU boards. The important question however, has been answered. By using a low cost hardware and freely available software you will know for sure if CUDA and nVidia GP-GPUs will work for your applications. If the answer is "yes," you now have a new world of hardware acceleration available.


Sections