![]() | ||||
| ||||||
| Research Topics and Sponsored Projects Discussion on various research topics and sponsored projects |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| |||
|
This post is in relation to a research project investigating the applicability of the Symphony paradigm to (scientific and engineering) numerical applications. One area we are investigating in what ways could the Symphony model and API be extended to efficiently parallelize a larger range of applications. But we still want to retain the main advantages of the Symphony model: simplicity, fault-tolerance, and load-balancing. A related post discusses the Conjugate-Gradient solver, which represents a class of more challenging applications to implement efficiently in Symphony. The LINPACK Benchmark represents a more challenging class still. Like the CG Solver, it also solves the N by N linear system Ax = b; however, it does this by first factoring A into lower and upper triangular matrices L and U, so that LU = A. This takes O(N^3) flops. The solution can then be computed in O(N^2) flops using x = U'L'b. In an (MPI) parallel implementation, such as the HPL benchmark, there are O(N) communication steps, and O(N*N) data is communicated. One approach would be to have service tasks persisting throughout the whole computation. So effectively they would have to communicate with each other. One extreme is that they would do so by MPI but we lose fault-tolerance and simplicity of programming. Another extreme would be to use cluster-enabled OpenMP, but current implementations have major performance problems (we have 2 PhD students working on this, but it is a very hard problem) and lack any fault-tolerance support. A third possibility is that a `data fabric' could be used to store global data (in this case A and b), which the service tasks download and upload. This has a resonance with the service-oriented flavor of Symphony. The data (fabric) service could be built into Symphony (or it could be envisaged as a `parallel' service, in which Symphony provides an interface to, between both the client and service tasks). The client would initially push the data (e.g. matrix) into the data service, and then the service tasks would progress in phases. At the start, they would pull any out-of-date (or new) data from the GA service, perform their computation, and then push any updated data back. This is effectively a commit operation and serves as a synchronization point (in this sense, this is rather like the Bulk Synchronous Processing (BSP) programming paradigm). If a service task fails to reach the point (in reasonable time), it could be restarted elsewhere, and when that task completes, the whole computation can resume. But two issues arise here. How can the service task be written so that it can re-start at an arbitrary synchronization point? How will it be able to reconstruct the image of the global data structure it needs for the current phase? A possible solution to the first issue is to have the service tasks persist only for each phase instead of the whole computation. As there are O(N) phases, this would rely on a very low overhead in service task startup (seems quite possible in the current Symphony architecture). The second issue could be addressed by extending the Common Data concept, so that the Service Instance process maintains a cache of data from previous phases. Symphony would need to ensure that some `affinity' is maintained in the tasks allocated to Services instances, and also that the invoke() mechanism is extended to a variety of (parameterized) tasks. What candidates are there for the data service? Global Arrays is a programming paradigm somewhere between MPI and shared memory (e.g. OpenMP) in terms of programmability. It also supports the block-cyclic matrix distribution (needed for the LINPACK benchmark and most dense linear algebra computations). But current implementations have no fault-tolerance and data replication support. A proper data fabric, such as GemFire does not have these problems. However, it would ned to be extended to have special support for matrices (and in particular block-cyclic matrix distribution). In summary, to extend the applicability of the Symphony paradigm to more communication-intensive applications, we must give up some simplicity. The question is: how much is acceptable / desirable?
__________________ -- Cheers, Peter (Peter Strazdins, Australian National University) |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|