HPCCommunity.org
 
Register

Go Back   HPC Community - High Performance Computing (HPC) Community > HPC Community.org > Research Topics and Sponsored Projects

Research Topics and Sponsored Projects Discussion on various research topics and sponsored projects

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old September 26th, 2008, 05:39 AM
Junior Member
 
Join Date: September 17th, 2008
Location: Canberra
Posts: 5
Default Extending Symphony for Numerical Applications

This post is in relation to a
research project
investigating the applicability of the Symphony paradigm to (scientific and engineering) numerical applications.

One area we are investigating in what ways could the Symphony model and API be extended to efficiently parallelize a larger range of applications. But we still want to retain the main advantages of the Symphony model: simplicity, fault-tolerance, and load-balancing.

A related post discusses the Conjugate-Gradient solver, which represents a class of more challenging applications to implement efficiently in Symphony. The LINPACK Benchmark represents a more challenging class still. Like the CG Solver, it also solves the N by N linear system Ax = b; however, it does this by first factoring A into lower and upper triangular matrices L and U, so that LU = A. This takes O(N^3) flops. The solution can then be computed in O(N^2) flops using x = U'L'b.

In an (MPI) parallel implementation, such as the HPL benchmark, there are O(N) communication steps, and O(N*N) data is communicated.

One approach would be to have service tasks persisting throughout the whole computation. So effectively they would have to communicate with each other. One extreme is that they would do so by MPI but we lose fault-tolerance and simplicity of programming. Another extreme would be to use cluster-enabled OpenMP, but current implementations have major performance problems (we have 2 PhD students working on this, but it is a very hard problem) and lack any fault-tolerance support. A third possibility is that a `data fabric' could be used to store global data (in this case A and b), which the service tasks download and upload. This has a resonance with the service-oriented flavor of Symphony. The data (fabric) service could be built into Symphony (or it could be envisaged as a `parallel' service, in which Symphony provides an interface to, between both the client and service tasks).

The client would initially push the data (e.g. matrix) into the data service, and then the service tasks would progress in phases. At the start, they would pull any out-of-date (or new) data from the GA service, perform their computation, and then push any updated data back. This is effectively a commit operation and serves as a synchronization point (in this sense, this is rather like the Bulk Synchronous Processing (BSP) programming paradigm). If a service task fails to reach the point (in reasonable time), it could be restarted elsewhere, and when that task completes, the whole computation can resume.

But two issues arise here. How can the service task be written so that it can re-start at an arbitrary synchronization point? How will it be able to reconstruct the image of the global data structure it needs for the current phase? A possible solution to the first issue is to have the service tasks persist only for each phase instead of the whole computation. As there are O(N) phases, this would rely on a very low overhead in service task startup (seems quite possible in the current Symphony architecture). The second issue could be addressed by extending the Common Data concept, so that the Service Instance process maintains a cache of data from previous phases. Symphony would need to ensure that some `affinity' is maintained in the tasks allocated to Services instances, and also that the invoke() mechanism is extended to a variety of (parameterized) tasks.

What candidates are there for the data service? Global Arrays is a programming paradigm somewhere between MPI and shared memory (e.g. OpenMP) in terms of programmability. It also supports the block-cyclic matrix distribution (needed for the LINPACK benchmark and most dense linear algebra computations). But current implementations have no fault-tolerance and data replication support.

A proper data fabric, such as GemFire does not have these problems. However, it would ned to be extended to have special support for matrices (and in particular block-cyclic matrix distribution).

In summary, to extend the applicability of the Symphony paradigm to more communication-intensive applications, we must give up some simplicity. The question is: how much is acceptable / desirable?
__________________
-- Cheers, Peter

(Peter Strazdins, Australian National University)
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump


All times are GMT. The time now is 10:44 PM.


Powered by vBulletin® Version 3.8.0 Release Candidate 1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.