![]() | ||||
| ||||||
| Installing, Managing and Running Symphony DE Support and troubleshooting questions for Symphony DE. |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| ||||
|
While browsing through the Development Guide, I came across a chapter called Automatic failure recovery feature Looks like a very comprehensive feature. However, it states at the begining that: "This feature is not applicable in Symphony DE." ![]() I like to know, in Symphony DE, what happens if "a SOAM process becomes unavailable"? Like it is illustrated in the diagram at the begining of the "Automatic failure recovery feature" document: |
| ||||||||||||||||||||||||||||||
|
Excellent question CG, one that's currently not covered in the documentation. First of all, with regards to the feature not being available in Symphony DE. Since Symphony DE is oriented towards being a development environment as opposed to a production environment, maintaining cluster reliability and availability is not the primary objective. In the event of hardware or daemon failure, the developer can always restart Symphony DE or the host without serious impact on the application development process. As to your question of what happens if a SOAM process becomes Unavailable? First of all, Symphony DE daemons (start_agent, RS, SD, SSM, SIM) are built as fault-resilient and reliable components. It is uncommon that the process will crash or hang. In the event that they do become unavailable, for example manually killed, please refer to the table below for the specifics on each Symphony DE process:
Symphony DE also provides features to handle Application failure recovery in case of abnormal termination of a client or service process:
Hope it helps. |
| ||||
| Quote:
I guess since I'm only testing my App therefore it's not necessary to configure "recoverable". Everything else behave the same as long as SSM remains available, correct? |
| ||||
| Quote:
<SessionTypes> <Type name="RecoverableClient" priority="1" recoverable="true" abortSessionIfClientDisconnect="false" sessionRetryLimit="3" taskRetryLimit="3" abortSessionIfTaskFail="false" suspendGracePeriod="100" taskCleanupPeriod="100" discardResultsOnDelivery="false"/> <Type name="OfflineClient" priority="1" recoverable="false" abortSessionIfClientDisconnect="true" sessionRetryLimit="3" taskRetryLimit="3" abortSessionIfTaskFail="false" suspendGracePeriod="100" taskCleanupPeriod="100"/> </SessionTypes> |
| ||||
| Quote:
But do notice that for the SessionReconnect samples, abortSessionIfTaskFail must be set to FALSE, to allow for the client to reconnect. Otherwise the Session will enter Abort state when the client disconnects. |
| ||||
|
I've posted a new article "Component Failure and Recovery in Symphony DE" in the Articles section. The article describes the behavior and recovery steps when Symphony DE components (both system daemons and client applications) or the host machine become unavailable. Here's a general overview: ![]() Comments welcomed. Last edited by Ajith; July 16th, 2008 at 06:50 PM.. |
| |||
|
Hi, I am new in grid computing and in symphony as well. I read the foundations_sym. pdf and it was very helpful but I still have some questions. I know the process to start up the cluster (foundations_sym. pdf ). Does somebody know the process to shut down the cluster? That is not include in the pdf file I read that symphony has fault tolerance and that every component in the system has a recovery operation, every component is monitored by another component, and can automatically recover from a failure (foundations_sym. pdf again haha) but I want to have more detailed information about it, for example, How much time it takes to restart the system in case of fault?, etc. Thanks a lot |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|