+ Reply to Thread
Page 1 of 3
1 2 3 LastLast
Results 1 to 10 of 21

Thread: Search parallelism using Symphony DE - Simple Demonstration

  1. #1
    lechen's Avatar
    lechen is offline Junior Member
    Join Date
    March 12th, 2008
    Location
    Toronto, Ontario
    Posts
    71
    Blog Entries
    1
    Downloads
    8
    Uploads
    0

    Default Search parallelism using Symphony DE - Simple Demonstration

    1. Dilemma

    Large database, but no single super-computer. Query performance less than desirable.

    2. Solution

    Search in parallel utilizing more machines. Maximizing resources while shortening search execution time.



    3. Demonstration (Simple)

    1) Large text file split into 2 or 3 smaller files, located on separate hosts
    2) Client queries for keyword. Demonstrate with "grep" command remotely executed by "symexec" tool.
    3) "symexec" task returns "Success" if result found; returns controlCode if no result found

    4. Performance Evaluation

    1) Query performance (in terms of query time) improve with the addition of more CPUs.

    2) Query Performance (in terms of % of original query time using 1 CPU) improve with increased data size. Overhead incurred by command (grep) takes up larger portion of query time when data size is small. Thus the solution is most effective when dealing with large data size.



    5. Applications

    1) Decryption
    2) Language translation / Dictionary search
    3) Log file search
    4) Symphony Task history query
    5) Any task involving Database query

    6. Packages

    Out-of-box Symphony DE package.

    7. How to run

    Single host mode:



    Multi-host mode (1 "symexec send" command per host):


  2. #2
    Ajith's Avatar
    Ajith is offline Symphony DE Moderator
    Join Date
    February 28th, 2008
    Location
    Markham, Ontario
    Posts
    104
    Blog Entries
    2
    Downloads
    10
    Uploads
    0

    Default More info?

    Hi Leigh,

    Thanks for posting this example. Symexec looks like an useful tool.

    Can you describe the pro's and con's of this solution vs. sending data through messages? My guess is that most users would want a simple way to deploy the data to the cluster and not have to pre-deploy or pre-partition the data.

    Can you also describe in more detail how you would solve the following problems in this way.

    1) Decryption
    2) Language translation / Dictionary search
    3) Log file search
    4) Symphony Task history query
    5) Any task involving Database query

    - Ajith

  3. #3
    lechen's Avatar
    lechen is offline Junior Member
    Join Date
    March 12th, 2008
    Location
    Toronto, Ontario
    Posts
    71
    Blog Entries
    1
    Downloads
    8
    Uploads
    0

    Default

    Quote Originally Posted by Ajith View Post
    Hi Leigh,

    Thanks for posting this example. Symexec looks like an useful tool.

    Can you describe the pro's and con's of this solution vs. sending data through messages?
    Pro: Simple Command Line interface, no need for user to create client and service program

    Con: Message cannot be pased back to the sender, only a return code is issued.


    Alternative Solution

    The implimentation below demonstrates a similar solution, by using Symphony Client and Service, instead of Symexec.

    Client Side

    Modify out-of-box sample AsyncClient.java. Issue the search commands via Task Input Message. All other code unchanged. This particular command queries for all "Error" tasks in Task History.

    Code:
                            
    // Create a message
    String command = "/bin/grep Error /data//history/SampleAppJavaGF_task.soamdb";
    MyInput myInput = new MyInput(taskCount, command);
    
    Service Side

    Modify out-of-box sample MyService.java. Execute command (received from Task Input Message) on compute host, process command output, and then return results via Task Output Message. All other code unchanged. This particular service will fetch and return 3 Task Property fields for each Error Task found in Task History.

    Code:
            
    try {
        // Execute command 
        Process p = Runtime.getRuntime().exec(myInput.getString());
        BufferedReader in = new BufferedReader(
                            new InputStreamReader(p.getInputStream()));
    
        // Process command output
        String line = null;
        while ((line = in.readLine()) != null) {
             String[] results = line.split(",");
             sb.append("\nTask: ");
             sb.append(results[0]);
             sb.append("\tSession: ");
             sb.append(results[1]);
             sb.append("\t\tStatus: ");
             sb.append(results[3]);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    
    // Return results
    myOutput.setString(sb.toString());
    
    How to run

    1) Build Java SampleApp
    2) Deploy SampleServiceJavaPackage
    3) Register SampleAppJava.xml (make sure Java PATH is defined in Service section)
    4) Execute RunAsyncClient.sh



    Packages
    Attached Files
    Last edited by lechen; April 24th, 2008 at 12:51 AM.

  4. #4
    lechen's Avatar
    lechen is offline Junior Member
    Join Date
    March 12th, 2008
    Location
    Toronto, Ontario
    Posts
    71
    Blog Entries
    1
    Downloads
    8
    Uploads
    0

    Default

    Quote Originally Posted by Ajith View Post
    Hi Leigh,
    My guess is that most users would want a simple way to deploy the data to the cluster and not have to pre-deploy or pre-partition the data.
    In the case of Symphony Task History, systematic file rollover is applied. Thus the command sent to each individual host can simply query a specific set of history files on the shared location.

    Demonstration of data deployment is under drafting.
    Last edited by lechen; April 24th, 2008 at 12:49 AM.

  5. #5
    ComputerGuy's Avatar
    ComputerGuy is offline Junior Member
    Join Date
    April 24th, 2008
    Posts
    22
    Downloads
    2
    Uploads
    0

    Default

    Hi lechen. I've downloaded the sample Client and Service, but encountered the following error message when trying to compile MyService:

    src/com/platform/symphony/samples/SampleApp/service/MyService.java: In class `com.platform.symphony.samples.SampleApp.service.M yService':
    src/com/platform/symphony/samples/SampleApp/service/MyService.java: In method `com.platform.symphony.samples.SampleApp.service.M yService.onInvoke(com.platform.symphony.soam.TaskC ontext)':
    src/com/platform/symphony/samples/SampleApp/service/MyService.java:81: Can't find method `split( Ljava/lang/String; )' in type `java.lang.String'.
    String[] results = line.split(",");
    ^
    1 error
    make: *** [SampleAppjava] Error


    Any ideas?

  6. #6
    lechen's Avatar
    lechen is offline Junior Member
    Join Date
    March 12th, 2008
    Location
    Toronto, Ontario
    Posts
    71
    Blog Entries
    1
    Downloads
    8
    Uploads
    0

    Default

    Quote Originally Posted by ComputerGuy View Post
    Hi lechen. I've downloaded the sample Client and Service, but encountered the following error message when trying to compile MyService:

    src/com/platform/symphony/samples/SampleApp/service/MyService.java: In class `com.platform.symphony.samples.SampleApp.service.M yService':
    src/com/platform/symphony/samples/SampleApp/service/MyService.java: In method `com.platform.symphony.samples.SampleApp.service.M yService.onInvoke(com.platform.symphony.soam.TaskC ontext)':
    src/com/platform/symphony/samples/SampleApp/service/MyService.java:81: Can't find method `split( Ljava/lang/String; )' in type `java.lang.String'.
    String[] results = line.split(",");
    ^
    1 error
    make: *** [SampleAppjava] Error


    Any ideas?
    CG, which version of Java are you using? the split() method was not available to the String Class in 1.4.x. Should work in 1.5.x (I'm using jdk1.5.0_08).

    I just used String.split to process the returned Task History Data. Of course you can implement whatever code you want there to suit your needs.

  7. #7
    ComputerGuy's Avatar
    ComputerGuy is offline Junior Member
    Join Date
    April 24th, 2008
    Posts
    22
    Downloads
    2
    Uploads
    0

    Default

    AH HA~ Got it compiled and running now! Thanks

  8. #8
    oags15 is offline Junior Member
    Join Date
    October 20th, 2008
    Location
    Germany
    Posts
    11
    Downloads
    0
    Uploads
    0

    Default

    Hi,

    I want to do an application but I don’t really know how to do it. Maybe someone can help me. I have 3 hosts and 1 master, all of them in windows and using C++. I have to send job 1, 2 and 3 to all hosts (jobs will always be executed in parallel) but in host 1 it suppose that I only must get output form job 1 (jobs 2 and 3 must be running but hidden and without displaying output); in host 2 I have to get output form job 2 (jobs 1 and 3 must be running but hidden and without displaying output) and host 3 only displays output form job 3 (jobs 1 and 2 must be running but hidden and without displaying output). I want to do this because, if one host fail immediately I can get the output from another host, let’s imagine that host 1 fail, host 2 can show the output from job 2 and job 1 without waiting so much time and the same can happened if host 2 or 3 fails. Do you know how can I do that?

    Thank you,

    OAGS

  9. #9
    Ajith's Avatar
    Ajith is offline Symphony DE Moderator
    Join Date
    February 28th, 2008
    Location
    Markham, Ontario
    Posts
    104
    Blog Entries
    2
    Downloads
    10
    Uploads
    0

    Default

    Hi OAGS,

    Symphony won't handle your use case. The default policy will send a task to one host, if the host fails, the task is then sent to another host. You can either execute tasks in parallel by sending the tasks to Symphony all at once or sequentially, by sending tasks to Symphony 1 by 1. Your case sounds strange since you want tasks to execute in parallel, by they seem to depend on the other results.

    - Ajith

  10. #10
    oags15 is offline Junior Member
    Join Date
    October 20th, 2008
    Location
    Germany
    Posts
    11
    Downloads
    0
    Uploads
    0

    Default

    Hi Ajith,
    The tasks are independent one of each other; tasks get one number, make a computation and display the output. Host 1 displays output 1, the same for host 2 and 3. I want to run in parallel because if one host fails I have another to get the output. To do that I need that host 1 displays only output 1, the other tasks must be running (all tasks will be executing in all hosts at the same time) but without displaying nothing and as soon as host 1 fails, host 2 for example displays the output of task 1 and 2. If host 2 fails, host 3 displays the output of tasks 1, 2 and 3. Maybe in that way the time we need to wait in case of a host fails is only 1 second or less because we only need to activate the output of a task that is already running in another host. The system does not need to send the task to another host and the data to get the output. If I can’t do that with the DE, do you think that the 4.0 version can do it?

    Thank you,

    OAGS

+ Reply to Thread
Page 1 of 3
1 2 3 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts