Everyone knows applications drive the HPC boat. It is one thing to run benchmarks and burn-in programs, but when it is time for production work, applications take over. Fortunately, there are many applications that can take advantage of clusters. These applications can be divided into three oversimplified categories.
Sequential Applications - These applications run on a single core. While they may not be parallel (use multiple cores) the end user may run many copies of the same program with different input parameters. This type of computing is often called parametric processing. These types of programs can be written in any type of computer language.
Threaded Applications - These applications use multiple cores, but only on one SMP node. Most of these programs are written using C/C++ or Fortran and use pthreads or OpenMP - As core counts increase many HPC users are taking advantage of this approach.
Parallel Applications - These applications are written to use multiple cores across many nodes. They are mostly written in Fortran or C/C++ and use MPI as way to send messages between nodes.
To be complete, it should be mentioned that threaded applications are often considered parallel applications and MPI applications can run on SMP systems as well. It is possible, but not very common, to run threaded applications across cluster nodes. There are hardware/software solution's to accomplish this, however. For this article, we will focus on parallel MPI applications.
These applications can range from small (using several nodes) to massive (use several thousand nodes). There are many application in the "MPI space" and cataloging them is too big a task for this article. It is possible, however, to highlight some popular applications in several HPC vertical areas. Such a listing may start as a jumping off point for those interested in learning more about HPC clusters in their specific domain. It is also important to remember that the HPC application space is quite vast. Thus, those interested in Computational Fluid Dynamics (CFD) probably have no interest in Molecular Dynamics (MD), and many of the words, abbreviations, and acronyms may seem a bit strange.
Commercial vs Open Source
In preparing this article, I also made the decision to include open, or freely available applications. This is not a political statement (I'll get to that in a minute), but rather a pragmatic issue. In HPC, "open" usually means easily available or easy to try with access to the source code. I have found that most users like to try things on their own and learn at their own pace. In addition, there are usually user communities that offer help with specific issues. For this reason, I have included only open packages that are freely available. Be aware, however, that packages have different licenses and some restrict usage to non-commercial activities. Some source packages require a modest license fees for educational institutions and as such were not included in the survey.
Many excellent commercial HPC applications are available. Those who need support, additional features, and even performance tuning should contact commercial vendors. Note that some vendors will actually provide support for some of the applications mentioned below. Finally, some vendors offer evaluation versions of their products.
Open Science and Open Source
Before I introduce the applications, I would like to discuss what some call "Open Science". The scientific process depends upon open disclosure of information and verification of results. There are some who believe that open source should be part of this process. That is, with the heavy use of computers in virtually all fields of science, the use of open verifiable software is essential to the scientific method. A closed source application is often considered a "black box" that must be trusted to work properly. Clearly, any software can have issues, bugs, or produce incorrect results, but access to source code seems to better support the scientific method. In the end, it is best to choose what works best for you and addresses your HPC needs.
Finally, almost all of these applications are considered "production quality." They are used everyday on clusters around the globe. Indeed, they are not back water open source applications that go unsupported after a few years, they are the tools of modern science. Enough introduction and philosophy, let's get to the applications. Also note, many of the descriptions were pulled from the project web pages and there is no significance to the order in which the applications are listed. (i.e. they are all great!)
Bioinformatics
mpiBLAST is a freely available, open-source, parallel implementation of NCBI BLAST. By efficiently utilizing distributed computational resources through database fragmentation, query segmentation, intelligent scheduling, and parallel I/O, mpiBLAST improves NCBI BLAST performance by several orders of magnitude while scaling to hundreds of processors. mpiBLAST is also portable across many different platforms and operating systems. Lastly, a renewed focus and consolidation of the many codebases has positioned mpiBLAST to continue to be of high utility to the bioinformatics community.
License: GNU Public License (GPL) version 2
MPI-HMMER is an open source MPI implementation of the HMMER protein sequence analysis suite. The main search algorithms, hmmpfam and hmmsearch, have been ported to MPI in order to provide high throughput HMMER searches on modern computational clusters. MPI-HMMER has sophisticated I/O, a self-contained coordinator/worker model, and the easy inclusion of accelerated architectures (e.g. GP-GPUs).
License: GNU General Public License
Molecular Dynamics
GROMACS is an engine to perform molecular dynamics simulations and energy minimization. These are two of the many techniques that belong to the realm of computational chemistry and molecular modeling. Computational Chemistry is just a name to indicate the use of computational techniques in chemistry, ranging from quantum mechanics of molecules to dynamics of large complex molecular aggregates. Molecular modeling indicates the general process of describing complex chemical systems in terms of a realistic atomic model, with the aim to understand and predict macroscopic properties based on detailed knowledge on an atomic scale. Often molecular modeling is used to design new materials, for which the accurate prediction of physical properties of realistic systems is required.
License: GNU General Public License
NAMD is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. NAMD scales to hundreds of processors on high-end parallel platforms, as well as tens of processors on low-cost commodity clusters, and also runs on individual desktop and laptop computers. NAMD works with AMBER and CHARMM potential functions, parameters, and file formats.
License: University of Illinois,NAMD Molecular Dynamics Software,Non-Exclusive, Non-Commercial Use
Desmond is a software package developed at D. E. Shaw Research to perform high-speed molecular dynamics simulations of biological systems on conventional commodity clusters. The code uses novel parallel algorithms and numerical techniques to achieve high performance and accuracy on platforms containing a large number of processors, but may also be executed on a single computer.
License: Source code is available for non-commercial use.
OpenAtom is a highly scalable and portable parallel application for molecular dynamics simulations at the quantum level. It implements the Car-Parrinello ab-initio Molecular Dynamics (CPAIMD) method. OpenAtom is written using the Charm++ parallel programming framework. It runs on a variety of architectures like PowerPC, Opteron and Intel-based systems.
License: BSD License
Electronic Structure/Quantum Chemistry
GAMESS is a program for ab initio molecular quantum chemistry. Briefly, GAMESS can compute SCF wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF. A variety of molecular properties, ranging from simple dipole moments to frequency dependent hyperpolarizabilities may be computed. Many basis sets are stored internally, together with effective core potentials or model core potentials, so that essentially the entire periodic table can be considered.
License: GAMESS User License Agreement
MPQC is the Massively Parallel Quantum Chemistry Program. It computes properties of atoms and molecules from first principles using the time independent Schrödinger equation. It runs on a wide range of architectures ranging from individual workstations to symmetric multiprocessors to massively parallel computers. Its design is object oriented, using the C++ programming language.
License: GNU Library General Public License and GNU General Public License
OpenMX (Open source package for Material eXplorer) is a program package for nano-scale material simulations based on density functional theories (DFT), norm-conserving pseudopotentials, and pseudo-atomic localized basis functions. Since the code is designed for the realization of large-scale ab initio calculations on parallel computers, it is anticipated that OpenMX can be a useful and powerful tool for nano-scale material sciences in a wide variety of systems such as biomaterials, carbon nanotubes, magnetic materials, and nanoscale conductors.
License: GNU General Public License
Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials (both norm-conserving, ultrasoft, and PAW). Quantum ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization.
License: GNU General Public License
Environment/Weather
POP (Parallel Ocean Program) is an ocean circulation model derived from earlier models of Bryan, Cox, Semtner and Chervin in which depth is used as the vertical coordinate. The model solves the three-dimensional primitive equations for fluid motions on the sphere under hydrostatic and Boussinesq approximations. Spatial derivatives are computed using finite-difference discretizations which are formulated to handle any generalized orthogonal grid on a sphere, including dipole and tripole grids which shift the North Pole singularity into land masses to avoid time step constraints due to grid convergence.
License: None Stated
WRF (Weather Research and Forecasting Model) is a software framework (WSF) that supports two dynamical solvers: the Advanced Research WRF developed and maintained by the Mesoscale and Microscale Meteorology Division of NCAR, and the nonhydrostatic Mesoscale Model developed by the National Centers for Environmental Prediction with user support provided by the Developmental Testbed Center.
License: Public Domain Notice
MM5 is a mesoscale model for weather forecasting. It is a limited-area, nonhydrostatic, terrain-following sigma-coordinate model designed to simulate or predict mesoscale atmospheric circulation. The model is supported by several pre- and post-processing programs, which are referred to collectively as the MM5 modeling system. The MM5 modeling system software is mostly written in Fortran, and has been developed at Penn State and NCAR as a community mesoscale model with contributions from users worldwide.
License: Public Domain Legal Notice
Computational Fluid Dyanamics
OpenFOAM® (Open Field Operation and Manipulation) is CFD Toolbox that can simulate anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics, electromagnetics and the pricing of financial options. OpenFOAM is produced by OpenCFD Ltd and is freely available and open source, licensed under the GNU General Public License.
License: GNU General Public License
Gerris is an open source finite volume code for the solution of the partial differential equations describing fluid flow. The source code, which is written in C, is licensed under the Free Software Foundation's GPL.
License: GNU General Public License
Overture is an object-oriented code framework for solving partial differential equations in serial and parallel computing environments. It provides a portable, flexible software development environment for applications that involve the simulation of physical processes in complex moving geometry. It is implemented as a collection of C++ libraries that enable the use of finite difference and finite volume methods at a level that hides the details of the associated data structures, as well as the details of the parallel implementation.
License: Non-commercial research and development agreement
Finite Element Analysis
Elmer is an open source multiphysical simulation software developed by CSC. Elmer development was started 1995 in collaboration with Finnish Universities, research institutes and industry. Elmer includes physical models of fluid dynamics, structural mechanics, electromagnetics, heat transfer and acoustics, for example. These are described by partial differential equations which Elmer solves by the Finite Element Method (FEM).
License: GNU General Public License
Impact is an open source finite element program suite which can be used to predict most dynamic events such as car crashes or metal sheet punch operations. They usually involve large deformations and high velocities. Simulations are made on a virtual three dimensional model which can be created with a pre-processor or with the built-in Fembic language. Results are viewed in a post-processor.
License: GNU General Public License
WARP3D is under continuing development as a research code for the solution of very large-scale, 3-D solid models subjected to static and dynamic loads. Specific features in the code oriented toward the investigation of fracture in metals include a robust finite strain formulation, a general J-integral computation facility (with inertia, thermal, face loading), interaction integrals for computation of linear-elastic fracture parameters, very general element extinction and node release facilities to model crack growth, nonlinear material models including viscoplastic and cyclic, cohesive elements and cohesive constitutive models, and the Gurson-Tvergaard dilatant plasticity model for void growth.
License: GNU General Public License
Cosmologly
GADGET is a freely available code for cosmological N-body/SPH simulations on massively parallel computers with distributed memory. GADGET uses an explicit communication model that is implemented with the standardized MPI communication interface. The code can be run on essentially all supercomputer systems presently in use, including clusters of workstations or individual PCs.
License: GNU General Public License
Enzo is an adaptive mesh refinement (AMR), grid-based hybrid code (hydro + N-Body) which is designed to do simulations of cosmological structure formation. It uses the algorithms of Berger & Collela to improve spatial and temporal resolution in regions of large gradients, such as gravitationally collapsing objects. The Enzo simulation software is incredibly flexible, and can be used to simulate a wide range of cosmological situations with the available physics packages.
License: University of Illinois/NCSA Open Source License
GeoScience
SCRF (Stanford Center for Reservoir Forecasting) is an industrial affiliates program in the Energy Resources Engineering Department of the School of Earth Sciences at Stanford University. SCRF was initiated in 1988 to further the development and integration of geological, geophysical and reservoir engineering data and techniques for forecasting reservoir performance.
License: GNU General Public License
CAD Solid Modeling
BRL-CAD is a powerful cross-platform constructive solid geometry solid modeling system that includes an interactive geometry editor, ray-tracing for rendering & geometric analyses, network distributed framebuffer support, image & signal-processing tools.
License: BSD License and GNU Library General Public License
Visualization
While these may not be "parallel applications", they assist in visualizing results from some of the applications mentioned.
VMD is designed for modeling, visualization, and analysis of biological systems such as proteins, nucleic acids, lipid bilayer assemblies, etc. It may be used to view more general molecules, as VMD can read standard Protein Data Bank (PDB) files and display the contained structure. VMD provides a wide variety of methods for rendering and coloring a molecule: simple points and lines, CPK spheres and cylinders, licorice bonds, backbone tubes and ribbons, cartoon drawings, and others. VMD can be used to animate and analyze the trajectory of a molecular dynamics (MD) simulation. In particular, VMD can act as a graphical front end for an external MD program by displaying and animating a molecule undergoing simulation on a remote computer.
License: UIUC Open Source License
Molden is a package for displaying Molecular Density from the Ab Initio packages GAMESS-UK , GAMESS-US and GAUSSIAN and the Semi-Empirical packages Mopac/Ampac, it also supports a number of other programs via the Molden Format. Molden reads all the required information from the GAMESS / GAUSSIAN outputfile. Molden is capable of displaying Molecular Orbitals, the electron density and the Molecular minus Atomic density. Either the spherically averaged atomic density or the oriented ground state atomic density can be subtracted for a number of standard basis sets. Molden supports contour plots, 3-d grid plots with hidden lines and a combination of both
License: Free for Academic use
OpenDX (Open Data eXplorer) gives you new control over your data and new insights into their meaning. Yet OpenDX is easy to use because it lets you visualize data in ways you've never dreamed of -- without getting bogged down in the technology. If you need visualization for anything from examining simple data sets to analyzing complex, time-dependent data from disparate sources, OpenDX has what you need: features and functions that let you easily gain meaningful insight into your data.
License: IBM Public License
The list could be much longer! If you would like to add your favorite open/freely available HPC application to the list, please send me a note at my Cluster Monkeyaddress. Now get busy.
Last edited by deadline; June 26th, 2009 at 04:46 AM.
If you're willing to accept the GAMESS license as open-source, why do you not also mention NWChem, Dalton, Aces II and III, CFOUR, PSI, etc.?
PSI and Aces III are both GPL, although the latter is not production.
NWChem and Dalton are both quite robust and do many things GAMESS does not. Both have far more exhaustive documentation (400+ pages in both cases) for users (NWChem also has a programmer's manual for developers)
Unlike GAMESS, NWChem has an official support system for users.