Background
The Kusu master node is tested with concurrent provisioning requests. The idea is to simulate all nodes in a cluster booting up at the same time, for instance after a power outage.
Setup
The test are performed on 4 Dell machines. One of them is the Kusu master node, the other three are on the cluster's provisioning network. They communicate through a 10/100 Mbps switch.
The cluster database is populated with the information of 256 diskless nodes.
Testing Procedure
A tool performing two tasks is used.
First, node installer information (NII) is retrieved from the server.A plain text file lists the IPs of the diskless nodes in the cluster database; a different IP is included in each request to simulate many hosts. The load test spawns threads on the testing machine, one thread for each diskless node in its configuration file. The NII is checked for errors.
Second, the diskless image is downloaded. Each thread launches a wget process to download the image. The download is redirected to /dev/null and wget's output is inspected to determine download success or failure.
The command is executed using pdsh to simultaneously launch the connections on all three testing machines.
Measurements
Timestamps are taken immediately before the request for NII is made, immediately after the NII request completes, immediately before the wget process is launched and immediately after the wget process completes. From these four timestamps, the required time to fulfill the NII request, image download and total test time is determined.
In addition, the status code returned with the NII request is recorded as well as the success or failure of the diskless image download.
Results
Below are two graphs representing the duration required to complete the NII request and the diskless image download with increasing concurrent connections. Errors resulting from NII requests timing out are also shown.
The total duration is not displayed in the graph; it is marginally longer than the maximum image download duration.
Observations
The total test duration increases linearly as the number of concurrent connections increases. The increment is roughly 5 minutes 40 seconds for each 10 additional concurrent connections.
At 170 concurrent connections, NII requests begin to time out. The time out is around 3 minutes, as the maximum NII request duration is a steady 189 seconds each time a timeout occurs.
No "500 Server Errors" were returned by the server. This shows that the MySQL server can handle the load.
Notes
With 256 concurrent connections, Apache's error log contained:
This message did not appear with 250 concurrent connections or less.Code:[Thu Dec 06 20:21:11 2007] [error] server reached MaxClients setting, consider raising the MaxClients setting
The connection that pdsh (ssh?) maintains with the testing machines timed out after about 130 minutes with the following message:
At this stage, the pdsh command exits. This only occurred when testing with 240, 250 and 256 concurrent connections. The tests were automated with a script which would sleep for 5 minutes after the completion of a test (the pdsh command terminating) before launching the next test. In the case of pdsh timing out, the next test may have begun before the previous test had a chance of completing, or while the web server was still servicing existing connections.Code:compute-00-01: Read from remote host compute-00-01: Connection timed outpdsh@master: compute-00-01: ssh exited with exit code 255


LinkBack URL
About LinkBacks
Reply With Quote