HPCCommunity.org
 
Register

Go Back   HPC Community - High Performance Computing (HPC) Community > Symphony Developer Edition (DE) > Symphony DE Articles and Helpful Tips > Technical Articles

Technical Articles Shared technical information about Symphony DE.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old July 10th, 2008, 01:27 PM
lechen's Avatar
Junior Member
 
Join Date: March 12th, 2008
Location: Toronto, Ontario
Posts: 71
Blog Entries: 1
Default Component Failure and Recovery in Symphony DE

1. Introduction

One of our members posted this question in the forum: "What happens if a SOAM process becomes unavailable?"

We will briefly describe the behavior and recovery steps when Symphony DE components (both system daemons and client applications) or the host machine become unavailable.

2. Background

First of all, the automatic failure recovery features available in the full version of Symphony are not applicable to Symphony DE. Symphony DE is oriented towards a development environment, not a production environment that the full version of Symphony is intended to serve. Maintaining cluster reliability and availability is not the primary objective of Symphony DE, nor is it the concern of developers when building and tuning their applications. In the event of hardware or daemon failure, developers can always restart Symphony DE or reboot the host without serious impact on the application development process.

Furthermore, Symphony DE middleware components (RS, SD, SSM, SIM, start_agent) are built and tested to be fault-resilient and reliable daemons. It is uncommon for theses processes to crash or hang.

3. System Daemons

In the event that a system daemon does become unavailable, for instance manually killed by a user, please refer to the table below for the specifics on each Symphony DE process.

Process ResultRecovery
start_agentSymphony DE shutdown on detectionManually restart Symphony DE
rsSymphony DE shutdown, unrecoverable workload lostManually restart Symphony DE
sdSymphony DE shutdown, unrecoverable workload lostManually restart Symphony DE
ssmSIM shutdown, SD restarts new SSM, unrecoverable workload lostAutomatic
simSI shutdown, SSM restarts new SIM, task running on previous SI reranAutomatic
4. Client Application

Symphony DE also provides features to handle Application failure recovery in case of abnormal termination of a client or service process due to coding or environment error. Refer to the Feature Reference document for details.

Application ResultRecoveryReference
Service Instance (SI)SI restarts, rerun workload (Configurable)AutomaticService error handling feature
ClientClient disconnects from sessionRelaunch client and reconnectClient recovery
5. Host

The host which Symphony DE runs on can also become unavailable due to hardware or power failure outside of Symphony DE. Symphony DE runs as a System Service on Windows, thus it will launch automatically as soon as Windows is restarted.

Host ResultRecovery
Linux/Solaris HostAll Symphony DE processes exit, unrecoverable workload lostRestart host, restart Symphony DE
Windows HostAll Symphony DE processes exit, unrecoverable workload lostRestart host, Symphony DE automatically starts
6. Summary

The diagram below offers a general overview of what we have discussed above:


Last edited by Ajith; July 14th, 2008 at 02:39 PM..
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump


All times are GMT. The time now is 11:13 PM.


Powered by vBulletin® Version 3.8.0 Release Candidate 1
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.