+ Reply to Thread
Results 1 to 5 of 5

Thread: Symphony Performance - ANU

  1. #1
    Ajith's Avatar
    Ajith is offline Symphony DE Moderator
    Join Date
    February 28th, 2008
    Location
    Markham, Ontario
    Posts
    104
    Blog Entries
    2
    Downloads
    10
    Uploads
    0

    Default Symphony Performance - ANU

    Hi Jasion,

    I assume that you are using Symphony 4.0, not 3.1.

    It was identified that the major time delay at the client side was between the last task sent to the compute side and function call. This was termed as Wait Time. Each iteration is independent and we expect same or similar performance for all the iterations. Most of the wait times (for matrix size (n) from 3000 to 6000) was ranging below 0.3 seconds. However, there were infrequent MVM instances were the Wait Times fluctuated abruptly, even up to 25 seconds, which was very hard to explain.
    I would guess the issue is on the client host. How are you measuring the delay in seconds? What code are you executing in the result up-call thread? Do you send your tasks out and then just wait for results or are you sending and waiting at the same time?

    I talked to Richard and he said he's never seen 25 second delays between results for short tasks.

    To evaluate better, please state the size of common data and task data-in/data-out and the average compute time. Can you also send the code that calculates the wait time?

    Also are your results repeatable and is your network/cpu dedicated to you?

    - Ajith


    1. As we see in 3.1 Symphony standard MVM application can
    perform efficiently at least in some cases (). Why does
    Symphony behave abruptly in other cases? Or in other words,
    what causes those delays of up to 25sec? Scheduler?
    Networking issues? Socket failure?
    2. Why do percentage performance improvement decreases as the
    size of matrix input increases (as in Figure 8)?

  2. #2
    jmulerik is offline Junior Member
    Join Date
    March 28th, 2008
    Location
    Canberra
    Posts
    14
    Downloads
    0
    Uploads
    0

    Default

    Hi Ajith,

    Thank you very much for the reply.

    It's great to hear that you have "never seen 25 second delays between results for short tasks". If that is real, I am happy. But we are experiencing that kind of delay for at least some cases. I attach the client, service and message codes. We are using Symphony DE 4.0. Matrix Vector Multiplier (MVM) is a synchronous application.

    Wait Time is the time between the 2 time stamps in the client code as given below:

    Code:
           rowCount = rowCount+noOfRowsSend;       
            }
            gettimeofday(&t_3, NULL);
    
    
            // Now get our results - will block here until all tasks retrieved
            EnumItemsPtr enumOutput = sesPtr->fetchTaskOutput(tasksToSend);
            gettimeofday(&t_4, NULL);
       
    
            // Inspect results
            TaskOutputHandlePtr output;
    
    Common data is the Matrix itself. That is a square matrix of the order 'n' (eg: 3000x3000 (double)). And the message is Vector of order 'n' (eg, an array of 3000 (double)).

    It is good to note that Matrix Vector multiplication does perform under 0.3 seconds (wait time) in many of the cases (up to n = 6000), using this application as seen in Figure 4. But it behaves abruptly in many other cases, when it is called continuously as in this application.

    Jaison
    Jaison Mulerikkal
    DCS, ANU, Canberra
    Australia

    http://cs.anu.edu.au/~Jaison.Mulerikkal

  3. #3
    jmulerik is offline Junior Member
    Join Date
    March 28th, 2008
    Location
    Canberra
    Posts
    14
    Downloads
    0
    Uploads
    0

    Default Test Files

    I attached the source files from the test.
    Attached Files
    Jaison Mulerikkal
    DCS, ANU, Canberra
    Australia

    http://cs.anu.edu.au/~Jaison.Mulerikkal

  4. #4
    Ajith's Avatar
    Ajith is offline Symphony DE Moderator
    Join Date
    February 28th, 2008
    Location
    Markham, Ontario
    Posts
    104
    Blog Entries
    2
    Downloads
    10
    Uploads
    0

    Default Timing the Client

    Hi Jaison,

    This is the reply from Richard; he is saying that the wait time between t_3 and t_4 is for all submitted task results to return.

    sesPtr->fetchTaskOutput(tasksToSend)
    - fetchTasKOutput will return after all task replies are received.

    Code:
     for (int iter = 0; iter < n; iter++ )
       {
            gettimeofday(&t_start, NULL);
            int* rowsSend = new int[tasksToSend];
            int* rowTag = new int[tasksToSend];
            int rowCount = 0; //initiate row count
    
            // Now we will send messages to our service
            for (int taskCount = 0; taskCount < tasksToSend; taskCount++)
            {
                    int lenVector = sizeof(double)*order;
                    int noOfRowsSend = numberOfRowsToSend(order, tasksToSend, taskCount);
    
                    gettimeofday(&t_1, NULL);
                    // Create message
                    MyInput inMsg(rowCount, noOfRowsSend, Vector, lenVector);
    
                    // Create task attributes
                    TaskSubmissionAttributes attrTask;
    
                    attrTask.setTaskInput(&inMsg);
    
                    // send it
                    TaskInputHandlePtr input = sesPtr->sendTaskInput(attrTask);
                    gettimeofday(&t_2, NULL);
    
                    elapsed1 =  t_2.tv_sec - t_1.tv_sec +
                    (t_2.tv_usec - t_1.tv_usec) / 1.e6;
                    outFile1 << elapsed1 << " " << input->getId() << endl;
    
                    rowsSend[taskCount] = noOfRowsSend;
                    rowTag[taskCount]=rowCount;
                    rowCount = rowCount+noOfRowsSend;
            }
            gettimeofday(&t_3, NULL);
    
            // Now get our results - will block here until all tasks retrieved
            EnumItemsPtr enumOutput = sesPtr->fetchTaskOutput(tasksToSend);
            gettimeofday(&t_4, NULL);
    
            // Inspect results
            TaskOutputHandlePtr output;
    
            gettimeofday(&t_5, NULL);
    
    From the code it seems like Jaison is doing the following
    • Get an over start time
    • for each task he is writing the time it takes for the message to be serialized and dispatched and confirmed between the client and the SSM
    • Then records a time stamp T3
    • Then tries to fetch all the tasks that he sent (at once)
    • Then records another time stamp T4
    • Then records another timestamp T5
    • Iterates through all the replies
    • Then records another timestamp T6
    • Then calculate a bunch of elapsed times.

    What is not clear to me is what is the real problem. This is a sync client. He is submitting more tasks than there are slots that have average run time of 1 second. (I don’t see a problem)..

    - Ajith
    Last edited by Ajith; January 28th, 2009 at 07:24 PM.

  5. #5
    Ajith's Avatar
    Ajith is offline Symphony DE Moderator
    Join Date
    February 28th, 2008
    Location
    Markham, Ontario
    Posts
    104
    Blog Entries
    2
    Downloads
    10
    Uploads
    0

    Default Variation n=6000

    Hi Jaison,

    I did a quick calculation and for n = 3000, common data is about 68Mb. For n=6000, that's 136Mb. Your task data is about 23kb for n = 3000 and 46kb for n = 6000.

    If the number of tasks is equal to n, then n=6000 means about 540Mb of data moving through the SSM.

    Do you see a correlation between wait times and n? If n is small, do you notice these long wait times?

    If there is a correlation, then a possibility is that you are seeing virtual memory fragmentation on either the client process or the SSM process. Memory fragmentation causes the operating system heap to run slower and slower since it has to look longer for each memory allocation. There's no way to tell directly without a memory monitor tool.

    Another possibility is that the host memory is low on either the client host or SSM host and the O/S is swapping to disk.

    Yet another possibility is that the SSM boundary conditions are being crossed, causing the SSM to slow down. You can check the SSM log for warning messages.

    One quick test would be to run your application and monitor the client process and the SSM process memory using top.

    The fragmentation issue may be solved with SSM tuning parameters. The memory issue can be solved by adding more memory.

    Another possibility is that with n increasing, the network bandwidth is getting saturated. Do you have high speed interconnects between the client host and the SSM host. Are both servers located in the same datacenter?

    - Ajith

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts