+ Reply to Thread
Results 1 to 3 of 3

Thread: Integrating Symphony DE with Matlab Parallel Computing Toolbox

  1. #1
    Young's Avatar
    Young is offline Junior Member
    Join Date
    March 5th, 2008
    Location
    Toronto, Canada
    Posts
    58
    Blog Entries
    1
    Downloads
    7
    Uploads
    0

    Default Integrating Symphony DE with Matlab Parallel Computing Toolbox

    1. Overview of Matlab Distributed Computing Toolbox
    Matlab is one of the most widely used tools in scientific and technical computing. Because of its roots in serial numerical libraries, Matlab has always been a serial program. But As modern scientific and engineering problems grow in complexity, the increase in processor speed and the amount of memory that can fit in a single machine could not catch up with the pace of computation requirements. Very often current scientific problems simply do not fit into the memory of a single machine, making parallel computation a necessity. So Matlab provides its Distributed Computing Toolbox (DCT) to solve this kind of problems as its distributed computing solution.

    DCT enables user to solve computationally and data-intensive problems using Matlab and Simulink in a multiprocessor computing environment. User can use it to solve problems that comprise either several independent units of work or a single large computation by harnessing multiple processors. These processors can reside in one multiprocessor computer or on a computer cluster.

    The key features of DCT are:
    • Distributed and parallel execution of applications
    • Interactive and batch execution modes
    • Distributed arrays, parallel algorithms, and parallel for loops
    • Inter-worker communication support based on the MPI (MPICH2) standard
    • Ability to run four workers locally on a desktop
    • Integration with MATLAB Distributed Computing Engine for developing cluster-based applications that use any scheduler or any number of workers
    2. Architecture of Distributed Computing Toolbox
    DCT enables user to coordinate and execute independent MATLAB operations simultaneously on a cluster of computers, speeding up execution of large MATLAB jobs. A job is some large operation that user need to perform in MATLAB session. A job is broken down into segments called tasks. User decides how best to divide job into tasks.

    Basically, DCT consists of following components:
    • Client: this is the normal Matlab session in which the job and its tasks are defined by using the functions provided by DCT. Often, it is on the machine where user programs Matlab.
    • Scheduler: it coordinates the execution of jobs and the evaluation of their tasks, distributes the tasks for evaluation to the individual Matlab sessions called workers. DCT has provided a build-in scheduler named job manager, also user can use any other third-party scheduler for the distributing tasks to workers.
    • Worker: it is the Matlab session which evaluates the task distributed by scheduler.



    In the above figure, there are three workers for evaluating tasks. These workers can reside in a multiprocessor computer or on a computer cluster. If user wants to use the build-in job manager as scheduler, Matlab Distributed Computing Engine should be run on each nodes.

    3. Example about Distributed Computing Toolbox Usage
    Let’s see a very simple example to get to know how DCT works. For the convenience, we just run this example in a single node, but it can be applyed to a cluster easily.

    Pre-conditions:
    • One WinXP machine installed Matlab R2007b (Distributed Computing Toolbox 3.2).
    • Use DCT build-in job manager as our scheduler.
    • Start two workers for evaluating tasks.

    Work should be done:
    In this example, we want to use two workers to generate a large random matrix respectively.

    Firstly, we should get the scheduler object of our scheduler, here it is a job manager named “my_jm” which has been started beforehand.
    Code:
    jm = findResource('scheduler', 'type', 'jobmanager', 'name', 'my_jm);
    
    Then we create a job and its two tasks for this job manager:
    Code:
    job = createJob(jm);
    createTask(job, @rand, 1, {5000,5000});
    createTask(job, @rand, 1, {5000,5000});
    
    OK, it’s time to submit this job.
    Code:
    submit(job);
    
    Then, those two worker start their works. We can check their CPU and memory usage by Windows Task Manager:



    The bottom Matlab process is our client session, the above two are our workers, obviously, during evaluation, they will occupy very large amounts of memories and CPU time.

    Finally, we can wait for this job to finish and get its result:
    Code:
    waitForState(job);
    results = getAllOutputArguments(job);
    
    As we see in this example, DCT is easy to use, it can bring better performance than traditional serial Matlab by utilizing multiprocessor computer or cluster. What makes us more interested is DCT provides a generic interface that lets user interact with third-party schedulers for distributing tasks to the nodes on the cluster for evaluation. DCT has adopted our LSF and Windows Compute Cluster Server (CCS) as fully supported third-party scheduler, next chapter we will focus on how to integrate Symphony DE with DCT.

    4. Integrating Symphony DE with Matlab
    In this chapter, we will propose a solution for integrating Symphony DE with Matlab Distributed Computing Toolbox, i.e., adopt Symphony DE as our scheduler.

    Because each job in our application is comprised of several tasks, the purpose of scheduler is to allocate a cluster node for the evaluation of each task, or to distribute each task to a cluster node. The scheduler starts remote Matlab worker sessions on the cluster nodes to evaluate individual tasks of the job. To evaluate its task, a Matlab worker session needs access to certain information, such as where to find the job and task data. The DCT generic scheduler interface provides means of getting tasks from client session to scheduler and thereby to cluster nodes.

    To evaluate a task, a worker requires five parameters that we must pass from the client to the worker. The parameters can be passed in any way we want to transfer them, but because a particular one must be an environment variable, the solution in this section pass all parameters as environment variables.



    In the above figure, we should note two functions: submit and decode, they work as a pair on client node and worker node respectively. The purpose of submit function is to set the required environment variables and submit the job to the scheduler, and the purpose of decode function is to get the required environment variables and pass them to the worker process. Since we adopt Symphony DE as our schduler, we should also provides our own submit and decode functions.

    4.1 Submit Function
    This function has three default parameters that are passed by DCT:
    • sched: the scheduler object in the client session, returned by the findResource function.
    • job: the job to be submitted.
    • submitProps: some useful properties for submitting job, such as: the parameters should be passed to worker as environment variables, Matlab binary name to be run which is needed when starts a worker process.

    Besides these three default parameters, we add another one:
    • sessionID: this is the ID of a Symphony session created beforehand, soam_submit function will use this session to submit the tasks.

    So the declaration of this submit function is:
    Code:
    function soam_submit(sched, job, submitProps, sessionID)
    
    As we know, Symphony DE provides a command “symexec” for remote binary execution, which can also pass the specified environment variables. So inside this submit function, we use symexec to submit each task to SOAM, and pass those five environment variables by its option(-e).

    For detailes about this submit function, please refer to the attached M-Files in a ZIP package.

    4.2 Decode Function
    The decode function is simpler than submit function, it just reads certain job and task information(in form of environment variables) into the Matlab worker session. Here are all its code:
    Code:
    function props = soam_decode(props)
    
    % Get the environment variables that were set in the submit function SOAM will transfer these across
    storageConstructor = getenv('STORAGE_CONSTRUCTOR');
    storageLocation = getenv('STORAGE_LOCATION');
    jobLocation = getenv('JOB_LOCATION');
    taskLocation = getenv('TASK_LOCATION')
    
    % Set props properties from the local variables:
    set(props, 'StorageConstructor', storageConstructor, 'StorageLocation', storageLocation, ...
        'JobLocation', jobLocation, 'TaskLocation', taskLocation);
    
    4.3 Example about using Symphony DE as scheduler
    As we have finished our submit and decode functions, let’s see an example which will use SOAM as the scheduler. For the convenience, we just run this example in a single node, but it can be applyed to a cluster easily.

    Pre-conditions:
    • One WinXP machine installed Matlab R2007b (Distributed Computing Toolbox 3.2) and Symphony DE 4.0.0.
    • Use Symphony DE as our scheduler and the related submit/decode functions are ready.
    • Use symexec to create a session for submitting task, suppose the session ID is 333.

    Work should be done:
    In this example, we want to use two workers to generate a large random matrix respectively.

    Firstly, we should get the scheduler object of our scheduler whose type is generic_soam(any scheduler type starting with the string “generic” creates a generic scheduler object), set its SubmitFcn property to our submit function(soam_submit), and pass the additional parameter sessionID to it.
    Code:
    sched = findResource('scheduler', 'type', 'generic_soam');
    sessionID = 333;
    set(sched, 'SubmitFcn', {@soam_submit, sessionID});
    
    Then we create a job and its two tasks for this scheduler:
    Code:
    job = createJob(sched);
    createTask(job, @rand, 1, {5000,5000});
    createTask(job, @rand, 1, {5000,5000});
    
    OK, it’s time to submit this job.
    Code:
    submit(job);
    
    Then our submit function(soam_submit) will be called inside for submitting each task, it will ask Symphony DE to start one worker process for each task, and our decode function(soam_decode) will be called in the worker process which will read all the required environment variables into the worker session, then the worker starts to evaluate the task.

    Similarly, We can check these two worker’s CPU and memory usage by Windows Task Manager:

    The above Matlab process is our client session, the bottom two are our workers, obviously, during evaluation, they will occupy very large amounts of memories and CPU time.

    Finally, we can wait for this job to finish and get its result:
    Code:
    waitForState(job);
    results = getAllOutputArguments(job);
    
    5. Issues

    (1) Typical distributed Matlab applications would require at least events for task start and finish, as well as job queued, running and finished. The basic event of task finish can be done already via Symphony API fetchTaskOutputs(). The other session and task statuses can be monitored via Symphony GUI. In general, it is understandable that a good application programming platform should include a programming model to support a kind of event notification for greater flexibility to write distributed applications.
    (2) Some distributed Matlab application may would require MPI ring to be set up. However, Symphony does not have mpiexec-like capability at this moment to form MPI ring as briefly described in Known Issues Thread.
    (3) Race condition can occur if multiple SIs on a grid-enabled mode attemp to initialize shared library generated from Matlab M-File containing the logic of distributed Matlab workers. There is no global lock for concurrent SIs to handle this condition reliably.



    6. Conclusion
    Nowadays, as the complexity of scientific and engineering work increases, DCT will play more important roles in Matlab. Acturally, the MathWorks has deeply realized this point, they update and improve DCT in a very fast pace. So we believe DCT will become the official and most popular solution for Matlab distributed computing, though there are lots of other distributed Matlab solutions.

    From the solution proposed in chapter 4, we can find DCT is easy to be integrated with other third-party scheduler, and we have achieved distributed computing by integrating it with our Symphony DE.

    This usability of this solution is also good, it’s easy for developers to utilize DCT to develop distributed application, no need to write lots of extra code, they can focus on the logic of their own work. For the administrators of the cluster, they also will feel comfortable to apply this solution to their large cluster, the only extra requirement is to have each node installed Matlab in the right path.

    As metioned in chapter 2, DCT has provides a build-in scheduler: job manager, now its performance is better than our solution. The reason is, when use a job manager, worker processes usually remain running at all times, dedicated to their job manager. With a third-party scheduler, workers are run as applications that are started for the evaluation of tasks, and stopped when their tasks are complete. If tasks are small or take little time, starting a worker for each one might involve too much overhead time. So our future work would be to investigate the implementation of job manager, to get to know whether we can make our solution have same behavior with job manager in this aspect, to decrease the overhead.

    6. Reference
    [1] Distributed Computing Toolbox User's Guide
    [2] MATLAB Distributed Computing Engine System Administrator's Guide
    [3] Parallel MATLAB: Doing it Right
    Attached Files
    Last edited by Young; April 22nd, 2008 at 09:51 PM.

  2. #2
    Ajith's Avatar
    Ajith is offline Symphony DE Moderator
    Join Date
    February 28th, 2008
    Location
    Markham, Ontario
    Posts
    104
    Blog Entries
    2
    Downloads
    10
    Uploads
    0

    Default

    Hi Young,

    Can you describe the benefits of the integration with Symphony?

    - Ajith

  3. #3
    Young's Avatar
    Young is offline Junior Member
    Join Date
    March 5th, 2008
    Location
    Toronto, Canada
    Posts
    58
    Blog Entries
    1
    Downloads
    7
    Uploads
    0

    Default Merits of Symphony - Matlab integration

    Resource management is not completely transparent to developers if you use DCT since you need to explicitly search for available resources on the grid through their API. In contrast, with Symphony as Matlab task scheduler, session and application level resource management is transparent to users while scheduling and resource sharing policies still remain flexibly configurable for advanced developers.
    In DCT, MCDE daemon does not manage task-level information. Symphony provides interface to programmatically identify task and control its context. Symphony also provides tools to monitor or kill tasks.
    Last edited by Young; April 22nd, 2008 at 09:51 PM.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts