1. Introduction
The issue with MPICH2-MPI and SymphonyDE integration was briefly covered in "Known Issues with Symphony DE 4.0" thread. In the thread a variation of the integration was mentioned and the diagram shown below abstracts the integration.
Symexec clients (SymMPIServer and SymMPIClient) will be used mostly to boot up a MPD ring and submit MPI processes that will spawn multiple child processes that will do parallel computing on the grid.
Note that MPI child processes can not only communicate each other but also communitate with their parent and master SymMPIServer, *directly*, bypassing SOAM.
2. The Flow
[1] SymMPIServer is a symexec client submitting pre-command on ExecutionSession to run mpdboot on the machines in the Symphony Grid. It opens a port to accept incoming messages from any MPI2 processes.
[2] SymMPIClient is another symexec client submitting MPI2 parent process. It will spawn child processes in [3].
[3] Child processes are created by MPI2 parent process submitted by SymMPIClient. Child processes form a communication group for parallel computation. The processes can communicate themselves or communication with their parent process.
[4] SymMPIServer and child processes can directly exchange message without the need for any middleware.
3. Sample
Sample is attached to this post to demonstrate the integration. Download and uncompress it to your Linux Symphony DE C++ samples directory ($SOAM_HOME/4.0/Samples/CPP). Follow the instruction in README.
Can you explain how the SymMPIServer client starts and stops the MPI processes? How does MPDBoot work? Do you need to execute this command once, or multiple times?
This is only a prototype of loose integration and it poses some limitations.
First of all, mpdboot is taken care by MPICH2. SymMPIServer submits mpdboot command as a pre-command. It will start MPD daemons on the hosts in the Grid and these daemons will form communiation network (MPD ring) for MPI2 processes. Hosts are specified in mpd.hosts file. You can terminate the ring with the command, mpdallexit, as a post-command. You can MPDBoot just once.
SymMPIServer itself is an MPI2 process running as a server.
SymMPIClient is the one that actually submits a parent MPI2 process to the Grid. The submitted MPI2 process will later spawn multiple MPI2 child processes on the remote host.
The following is what you may be questioning;
The biggest limitation of this prototype is that SOAM does share states with the MPD ring. Thus in case a MPD daemon exits from the ring in the middle of MPI2 process submission, whole MPI2 application can fail.