![]() | ||||
| ||||||
| Technical Articles Shared technical information about Symphony DE. |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| ||||
|
Symphony DE includes a scalable and robust set of middleware and accompanying API's to accelerate your data intensive applications with a compute cluster. This article tries to help you determine whether moving an application to a Symphony DE cluster will provide a significant performance gain. It also provides some general tips on how to improve the performance of an application using DE. I. Before You Start If you are not familiar with Symphony DE, have a look here at concepts. Before starting to do optimizations, make sure that you have timed your base test cases. When you try different improvements, you can compare these base times with your new times. II. Glossary SSM (session manager): A master host process that distributes tasks from your client application to SIM processes. SIM (service instance manager): A compute host process that gets tasks from the SSM and sends them to your service instance process. Application Profile – A configuration file for your application’s middleware settings. See the reference for more details. Symphony DE Performance Factors There are seven key factors that effect performance in Symphony DE; 1. Application parallelism1.0 Application Parallelism Symphony DE takes advantage of data parallelism to improve your application's performance. Partitioned data can be processed in parallel on different hosts and will finish running sooner than the aggregated data processed sequentially on one host. ![]() Applications that are suitable for data parallelism have some input data that can be easily broken up into smaller chunks. These chunks can be processed by a Symphony task. A Symphony task is a small unit of work, typically implemented by a function or class-method. Each task has a set of inputs and outputs. There are two boundary cases: 1. All tasks are independent and the input of one task does not depend on the output of another taskThe higher the number of independent tasks in an application, the higher the potential performance improvement due to data processing concurrency. If all your application’s tasks are interdependent, there will be no performance improvement due to splitting its data. Typical tasks that can be handed over to Symphony DE include CPU intensive tasks like financial risk analysis calculations. These calculations involve solving differential equations or walking a tree-like solution space. The input data can be partitioned into sub-sets and processed in parallel by the same task code. Tasks that have a large input/output dataset that can not be partitioned may not be suited to Symphony DE, as the overhead involved in sending data over your network may overshadow performance improvements due to data parallelism. To maximize performance, try to combine short running tasks so that a task takes roughly 2 seconds to run. This is based on our performance tests with 10 dual-CPU 3GHz CPU’s, Gigabit Ethernet and thousands of tasks in our test application. If you have time, you should measure your cluster’s minimum efficient task run time using a utility called symping. A longer running task will better utilize CPU resources but will prevent Symphony DE from balancing tasks efficiently between compute resources. Defining a task to run for say, 2 hours won’t be efficient. There will be a possibility for all the tasks in your session to complete except for one task that takes another 2 hours to run. 2.0 Service Code Efficiency and Robustness Symphony DE requires that you separate your application into a client and service. The service is the part of your application that runs on the compute hosts. The service can be written to execute tasks on one or more different sets of input data. An application can have more than one service binary associated with it. If your service code is CPU intensive, try to limit the number of slots that you configure on your compute hosts in the /conf/vem_resource.conf file. The number of slots should be configured so that each processing thread has a dedicated CPU core. For example, if you have a quad-core CPU and one processing thread per service, then configure 4 SIM slots on that host. Your service code must be robust. If a service process crashes while executing a task, the service instance manager (SIM) will restart the service’s process and the session manager (SSM) will reschedule the failed task. This may be obvious, but ensure that your service code runs reliably, as frequent crashes will reduce performance. You can periodically check the SSM’s log to see whether tasks are failing due to service code failures. If your application has several types of tasks, avoid the temptation of using parent-child services. In the parent-child model, the parent’s service code becomes a client for a child’s service code. This may simplify your application’s logic but causes your parent service’s compute slot to be blocked until the all the child tasks return their results. A better approach would be to combine all your tasks’ service code into one process. Write your client to get results from your parent tasks and then redistribute the new child tasks back to the same service. In this approach, no compute resource will be blocked while waiting for a result. 3.0 Resource Management Resource management in Symphony DE refers to its ability to dynamically acquire and release compute resources based on workload. This can be for multiple invocations of the same application (sessions) or for different applications sharing the cluster. Symphony DE uses static resource management. It can not lend and borrow compute resources (slots) from other applications using the cluster. Symphony DE will use up to the maximum number of slots in the cluster to serve your application. Symphony DE compute host agents called SIMs will be started and stopped on a slot depending on the number of pending tasks available. The SIMs’ lifetime is managed through the Consumer section in the Application Profile. You can configure how the SSM stops a SIM using the taskLowWaterMark setting. Symphony DE does not efficiently manage resources between different applications. Symphony DE can only manage resources for multiple instances of the same application. You should only run one application at a time on a Symphony DE cluster. 4.0 Load Balancing Load balancing in Symphony DE refers to its ability to distribute workload evenly among its compute resources. Symphony DE uses a SSM to distribute tasks to SIMs on the compute hosts. The SSM has configurable policies to determine how workload for different sessions is distributed. The default policy will balance the compute resources evenly for each instance (session) of your application. The shorter the run-time of your tasks, the more efficiently the SSM will balance compute resources to quickly run your application’s sessions. 5.0 Network Bandwidth and Latency Symphony can utilize up to 50% to 100% of your network’s bandwidth to transfer task data. You can get better network efficiency by aggregating task data using the Symphony API and combining tasks with less than 1 Kbyte of data into tasks of 10 Kbytes or greater. In a typical Symphony DE cluster, task data makes four network hops, two for input and two for output. Avoid putting compute hosts running SIMs on different network segments to the master hosts running SSMs. The SIM to SSM network performance determines the length of time the compute CPU is idle before the SSM sends it another task. The shorter the average task execution time the more this network performance is an issue. 6.0 Task Data Size and Application Memory Fragmentation Although this section applies primarily to C/C++ applications, Java and .NET programmers should also be aware of these memory issues. Operating system memory management is a key factor in determining your application’s performance. As your application allocates and frees memory, the operating system’s process heap tends to get fragmented and requires more CPU cycles to perform the same allocate and free operations. Heap fragmentation gets worse over time and can lead to very poor performance and memory allocation failures. Heap fragmentation can be fixed by restarting your application’s process. Allocation errors due to heap fragmentation can be reduced by keeping process memory usage below one gigabyte. This is 50% of a process’s virtual address space on 32 bit operating systems. Errors due to heap fragmentation are less common with 64 bit applications. More tips. If your application’s host has less than one gigabyte of physical memory, you should reduce memory usage to below the available physical memory to limit paging memory to the file system. Heap fragmentation can be an issue for the SSM, as all task data is transferred through its memory. It is important to keep the memory load on the SSM as low as possible to keep SSM performance at a high level. Keep memory load low by limiting the amount of data transferred in a task and use common data as much as possible. The SSM has a feature called flow control that monitors available physical and virtual memory. The SSM will create event and log entries if the thresholds specified in the Application Profile are exceeded. Flow control can be configured using the Boundary Manager Configuration. 7.0 File System Performance for Workload Recovery The SSM has a feature called a recoverable session. All task input and output data belonging to a session is written to the file system. In the event the SSM host is restarted, no task data sent to the SSM is lost. If session recovery is not important for your application, leave the session’s recoverable setting at false. The SSM has a feature called session and task history which by default is turned on. This feature is used to save statistics about your application’s workload and will write a small amount of data to the file system. This feature is not required in Symphony DE and can be disabled. See Session Type Configuration. 8.0 Conclusions Using a Symphony DE cluster to improve your application's performance will be successful if your application can benefit from data parallelism. If you make your application's tasks sufficiently long-running and not overwhelm Symphony DE with a lot of small tasks, Symphony DE can efficiently and reliably distribute your application's work to your compute cluster. You can use Symphony DE efficiently for small single application clusters from one host to about 50 hosts. As you add more hosts, the manageability of your cluster's resources becomes more difficult and more suited to the commercial Symphony product. |
| |||
|
Good article Ajith.. Useful inputs
__________________ Jaison Mulerikkal DCS, ANU, Canberra Australia http://cs.anu.edu.au/~Jaison.Mulerikkal |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|