The Cluster Decade

    At the end of each year, there are always "look back" and "look forward" articles that take stock in the past year or predict the future. This year, we enter a new decade and in a similar vein there are many stories recounting the technology changes over the past ten years.

    Ten years ago, the Intel PIII (Tualatin) and the AMD Athlon (Thunderbird) were the processors of choice for those venturing into the world of HPC clustering. We were still moving data around 32 bits at a time and things like InfiniBand were still on the drawling board. HPC clusters were still somewhat of a novelty, often dismissed by the large manufacturers as an academic project or a hobby technology.

    On the surface there were no sign of any big changes. The incumbents still ruled the market and commodity hardware was not considered a viable platform to run floating point intensive applications like the big UNIX supercomputers were doing at the time. If one looked a bit deeper, however, some fundamental changes were happening in the larger market that would have chilling effects on the HPC market.

    Tracking Disruption

    Before we look at how this change came about, let's take a look a few key indicators of the HPC market. One of the best historical records of HPC is the Top500.org site. Twice a year users from around the globe submit their results for running a linear algebra benchmark. The fastest 500 machines are listed on the site. In addition to program performance, several other factors about the machines are recorded, such as processor family, operating system, and architecture.

    If we look at Table One where the use of cluster architecture, Linux operating systems, and x86 processors are measured in the Fall of 2000 and again in the fall of 2009, we see a dominant change in each of these. Indeed, we see a swing from 10% or less to 83% or more.



    Table One: Number of machines that are clusters, use Linux, and x86 architecture in the Top500

    The move toward commodity x86 Linux clusters was not an accident nor was it the result of slick marketing. Three market forces, commodity hardware, open source software, and the Internet came together to fuel the growth of clusters. Much of the early proof points were done by the US government labs and in particular the Beowulf Project at NASA was clearly a catalyst for change. Early users built these types of systems themselves because they were composed of many parts and no single integrator could deliver (or support) an entire system.


    Figure One: One of the first Beowulf clusters. A 16 processor Pentium-based
    cluster called "Hrothgar" was located at Goddard Space Flight Center.


    Commodity Hardware

    Perhaps more than in any other market, price-to-performance is the driving factor for HPC. Application performance can be easily measured and price ratios easily calculated. In some areas (e.g. Government labs) absolute performance is important, but to many users how much "bang for the buck" was what mattered most. Indeed, it was really an exercise in maximizing the amount of performance one can get for a given budget.

    Prior to the Pentium Pro line of processors, x86 based systems were never considered serious contenders for large floating point applications. This job was relegated to the UNIX RISC processors or custom built supercomputer processors. Starting with the Pentium Pro line, x86 floating point performance began to improve. These improvements when combined with the lower cost of the Xx86 hardware offered a much better price-to-performance ratio for HPC. Compared to to proprietary vendor hardware, the commodity approach often offered a price-to-performance that was ten times better than solutions from the big iron vendors.

    Of course the commodity hardware had to work together (in parallel) to achieve the levels of performance required by the HPC market. To address this need, many users took advantage of low cost Ethernet as a means to connect the individual computers or servers that comprised their cluster. There were some more expensive interconnects at the time (e.g. Myrinet), but many users chose Ethernet as simple low cost interconnect. As an example, in 2000 only about 3.2% of the Top500 systems used Ethernet. In 2009, 52% still use Ethernet even though faster interconnects are available (e.g. InfiniBand).

    Commodity hardware also reduced the "barrier to entry" for end users. Entry into the HPC, or the supercomputer market was traditionally a "seven figure" investment for many organizations. By using commodity hardware, one could achieve supercomputing level performance with a five figure (or less) investment. Indeed, due to the commodity nature of the hardware, users could test feasibility with a small hardware investment and then scale up to meet their needs.

    Additionally, users could buy only the hardware that they needed. Most clusters have an economy of design, and users were able to allocate funds where it helped them the most. For instance, spending money on servers rather than on interconnects, or vice-versa, is a common approach.

    Finally, in many ways there was no vendor lock-in that is often associated today with traditional supercomputers. While the x86 processors were from either Intel or AMD, end users had a "choice" of motherboards, memory, hard drives, etc. that were all "interchangeable" (in theory) and thus provided a certain comfort level to those purchasing the hardware.

    As the decade progressed, the use of commodity hardware became the "norm" in HPC, providing users better price-to-performance, lower barrier to entry, choice, and no vendor lock-in. Complementing this trend was the growth of open software, which helped minimize the cost of entry and afforded users unprecedented control of their HPC resource.

    Open Source Software

    The advent of open source software and Linux is another key factor in the growth of cluster HPC, many of the open software packages do not require the payment of license fees. In particular, the Linux operating system and much of the supporting GNU software (compilers, libraries, tools, etc.) are licensed under the GPL license. Thus, these packages can be used to build an HPC cluster without additional cost. The ability to scale hardware without additional software costs allowed clusters to grow and expand in many ways.

    First, open software could be easily modified to work optimally in an HPC environment. Second, drivers for new hardware could be easily added to existing installations because the source code was available. And finally, there were no license restrictions on creating and sharing software because most of the software was distributed under the GPL or BSD open licenses.

    The reliance on the Linux operating system was not an accident. You can find a longer discussion in the article entitled Why Linux On Clusters. The use of Linux and open source removed many of the barriers typically found in commercial software. The open source slogan, "give a little, get a lot" certainly applied to the HPC community.

    There was also another advantage to the use of Linux -- it was plug-and-play compatible with UNIX and most large supercomputers use UNIX as their operating system. Several open source HPC middle-ware packages (e.g. MPI - Message Passing Interface or PVM - Parallel Virtual Machine) allowed a standard method of communicating between cluster nodes (individual server computers). These packages were easily compiled under Linux using the same open GNU compilers that were also available on many of the large supercomputers. Thus, moving a message passing application from a large supercomputer to a cluster was often a matter of re-compiling under Linux. Such efforts often took less than a day.

    The Internet

    While most clusters do not use the Internet to run programs, the Internet was a key player in their development. Like many other communities, the Internet supported world-wide conversations about clustering. Ideas, best practices, and software were easily shared among the early pioneers. One of the most important assets was (and still is) The Beowulf Mailing List and archives that are located at Beowulf.org. The discussions on the list range from best practices to new challenges to assisting those that are new to the community. There is no doubt that cluster HPC would have happened without the Internet, but it definitely would have been a much slower transition. In the case of HPC, the Internet has been an idea accelerator.

    Lingering Challenges

    While clusters now dominate the HPC landscape they are not without their challenges. Recent challenges such as power and cooling and heterogeneous computing (i.e. GP-GPUs) have added to the list of lingering issues facing the HPC market. Clusters have certainly "won the day," but they also inherit many of the "parallel computing" challenges that have faced the HPC sector for several decades.
    • Scalability - According to IDC 57% of all HPC applications/users surveyed use 32 processors (cores) or less. In other words, in the age of thousand core clusters, most applications can only use 32 (or less) cores before they see no further performance gain. The numbers are confirmed by a recent poll from ClusterMoney.net where 55% of those surveyed used 32 or less cores for their applications. At the high end, greater then 128 cores, the number of applications increase, leaving a valley of poor scalability between 32 and 128 cores. Creating applications that scale is therefore a challenge for the HPC community and market. The difficulty in writing parallel software is probably one of the reasons for poor scalability.
    • Cluster Management - Ten years ago, managing clusters was a major challenge for the community. Surprisingly, this is still an issue for many users (according to IDC). Users want "ease of everything" which includes set-up, expansion, monitoring, and repair. Over the last ten years, this aspect has improved, but so has the size of clusters. Ten years ago, a large cluster might have contained several hundred servers. Today, the typical cluster approaches thousands of servers. As the size increases so does the possibility of failure and many system administrators understand that daily issues will be common.
    • Parallel Programming - Creating parallel software is hard work. Fortunately, many of the popular applications have been ported to use the MPI communications layer and will run on clusters. There are many issues still facing the programmer, however. Adapting software to new hardware environments (e.g. multi-core and heterogeneous GP-GPU computing) is still an unresolved issue. In addition, optimization and debugging are still time-consuming processes that require specialized personnel. On advantage cluster HPC brings to the effort is the larger number of parallel systems available to programmers. That is, over the past ten years, the number of people with access to supercomputing-level hardware has increased tremendously and thus software efforts should increase commensurately.
    • Parallel Storage - Parallel storage has been an issue since the first clusters were built. Although some solid progress on the storage front has occurred, but the nature of the problem often prohibits a general solutions. Parallel storage systems often require customized approaches to achieve good results. The arrival of the pNFS (parallel NFS) standard may help in this effort.

    While these challenges all present various levels of difficulty, the advent of dynamic provisioning has opened up a whole new approach to cluster HPC. Dynamic provisioning allows a cluster server (or set of servers) to be configured at run time and instead of offering a fixed software environment, a more fluid approach is used. Traditionally a user was required to "fit the application" to the cluster environment (e.g. a specific OS, MPI, or compiler). With dynamic provisioning, the application now tells the scheduler (execution environment) what type of resources it needs (Linux version, Windows, etc.) and the servers are automatically provisioned for the user before the application runs. You can read more about dynamic provisioning in the recent HPC Community article, SC09: Three Trends Worth Watching.

    The Commodity Lesson

    If there is any one prediction for the next decade, it is "commodity will rule." One cannot argue against the economics of the commodity market. Indeed, any HPC solution that does not build off a commodity technology is going to have a rough time competing in the market. In addition, failure to embrace an "open approach" will also make market penetration difficult. The combination of mass produced hardware and open software has brought a new and powerful disruptive approach to the market. While some may assume that "commoditization" of HPC has removed the business incentive from the market, there are plenty of opportunities and challenges, mentioned above, that need good solutions. In the coming decade, cluster HPC will continue to reap the advantages of low cost hardware, open software, and community support. What will you do with your personal supercomputer?
    This article was originally published in forum thread: The Cluster Decade started by deadline View original post