HPCCommunity.org
 
Register

Go Back   HPC Community - High Performance Computing (HPC) Community > LAVA > Lava Support

Lava Support Get answers to your Lava issues from the community.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old October 2nd, 2008, 09:18 AM
Junior Member
 
Join Date: September 25th, 2008
Posts: 11
Default Platform Lava 6.1 - lsb.events - Parameters Dont Match

Hello,

My university's research lab is running Platform Lava 6.1:

-bash-3.00$ lsid
Platform Lava 6.1, May 5 2005
Copyright 1992-2004 Platform Computing Corporation

My cluster name is lava
My master name is ccls
-bash-3.00$


on a Dell HPC Cluster (Dell PowerEdge Cluster, Intel(R) Xeon(TM) CPU 3.00GHz).

We need to do analysis on the log file "lsb.events" to get information about the status of the submitted jobs. However, there are serious mismatches found when comparing the number of the parameters and the data types of the parameters in the log file against how they are documented in "lsb.events" man pages.

For example:
- JOB_NEW is documented to have 56 parameters. But JON_NEW record lines of 59 and 60 parameters are found in the log.
- JOB_STATUS, 12. But 12 and 31 found.
...

What could be the cause of this problem? And how to fix it please?

Thanks in advance,
Duke
Reply With Quote
  #2 (permalink)  
Old October 2nd, 2008, 09:47 AM
Junior Member
 
Join Date: September 25th, 2008
Posts: 11
Default

AND:

Our university don not give student researchers Lava root access. Where can we obtain a copy of Platform Lava 6.1 for experiment, please?

Thanks,
Duke
Reply With Quote
  #3 (permalink)  
Old October 2nd, 2008, 02:59 PM
Member
 
Join Date: September 16th, 2008
Location: Germany
Posts: 42
Default

If I parsed it correctly my lava 1.0 file has 54
Code:
     1  "1.0"
     2  1223460002
     3  18894
     4  500
     5  33564675
     6  1
     7  1223460002
     8  0
     9  0
    10  -65535
    11  0
    12  17157
    13  "lavaadmin"
    14  -1
    15  -1
    16  -1
    17  -1
    18  -1
    19  -1
    20  -1
    21  -1
    22  -1
    23  -1
    24  -1
    25  ""
    26  100.00
    27  2
    28  "normal"
    29  ""
    30  "master52"
    31  "/tmp"
    32  "/tmp//18893"
    33  ""
    34  ""
    35  ""
    36  "/home/lavaadmin"
    37  "1223459244.18893.18894"
    38  0
    39  ""
    40  ""
    41  "sleep 1000"
    42  "sleep 1000"
    43  0
    44  ""
    45  "default"
    46  1
    47  "LINUX86"
    48  ""
    49  16
    50  0
    51  ""
    52  ""
    53  ""
    54  -1
The man page specifies (is this the same for 6.1?)
Code:
     1  Version number (%s)
     2  Event time (%d)
     3  jobId (%d)
     4  userId (%d)
     5  options (%d)
     6  numProcessors (%d)
     7  submitTime (%d)
     8  beginTime (%d)
     9  termTime (%d)
    10  sigValue (%d)
    11  chkpntPeriod (%d)
    12  restartPid (%d)
    13  userName (%s)
    14  rLimits
    15  rLimits
    16  rLimits
    17  rLimits
    18  rLimits
    19  rLimits
    20  rLimits
    21  rLimits
    22  rLimits
    23  rLimits
    24  rLimits
    25  hostSpec (%s)
    26  hostFactor (%f)
    27  umask (%d)
    28  queue (%s)
    29  resReq (%s)
    30  fromHost (%s)
    31  cwd (%s)
    32  chkpntDir (%s)
    33  inFile (%s)
    34  outFile (%s)
    35  errFile (%s)
    36  subHomeDir (%s)
    37  jobFile (%s)
    38  numAskedHosts (%d)
    39  askedHosts (%s)
    40  dependCond (%s)
    41  preExecCmd (%s)
    42  jobName (%s)
    43  command (%s)
    44  nxf (%d)
    45  xf (%s)
    46  mailUser (%s)
    47  projectName (%s)
    48  niosPort (%d)
    49  maxNumProcessors (%d)
    50  schedHostType (%s)
    51  loginShell (%s)
    52  userGroup (%s)
    53  options2 (%d)
    54  idx (%d)
    55  inFileSpool (%s)
    56  commandSpool (%s)
    57  jobSpoolDir (%s)
    58  userPriority (%d)
So yeah you found a bug, I guess.

Last edited by _fmms_; October 2nd, 2008 at 03:06 PM..
Reply With Quote
  #4 (permalink)  
Old October 2nd, 2008, 09:34 PM
Junior Member
 
Join Date: September 25th, 2008
Posts: 11
Default

Thank you, _fmms_!

Here is the entries for JOB_NEW in my lsb.events 6.1, 56 parameters:

0. JOB_NEW
1. Version number (%s)
2. Event time (%d)
3. jobId (%d)
4. userId (%d)
5. options (%d)
6. numProcessors (%d)
7. submitTime (%d)
8. beginTime (%d)
9. termTime (%d)
10. sigValue (%d)
11. chkpntPeriod (%d)
12. restartPid (%d)
13. userName (%s)
14. rLimits
15. rLimits
16. rLimits
17. rLimits
18. rLimits
19. rLimits
20. rLimits
21. rLimits
22. rLimits
23. rLimits
24. rLimits
25. hostSpec (%s)
26. hostFactor (%f)
27. umask (%d)
28. queue (%s)
29. resReq (%s)
30. fromHost (%s)
31. cwd (%s)
32. chkpntDir (%s)
33. inFile (%s)
34. outFile (%s)
35. errFile (%s)
36. subHomeDir (%s)
37. jobFile (%s)
38. numAskedHosts (%d)
39. askedHosts (%s)
40. dependCond (%s)
41. preExecCmd (%s)
42. timeEvent (%d)
43. jobName (%s)
44. command (%s)
45. nxf (%d)
46. xf (%s)
47. mailUser (%s)
48. projectName (%s)
49. niosPort (%d)
50. maxNumProcessors (%d)
51. schedHostType (%s)
52. loginShell (%s)
53. exceptList (%s)
54. options2 (%d)
55. userPriority (%d)
56. extsched (%s)

Regards,
Duke
Reply With Quote
  #5 (permalink)  
Old October 6th, 2008, 03:12 AM
LSF Moderator
 
Join Date: June 24th, 2008
Posts: 2
Default

Quote:
Originally Posted by _fmms_ View Post
If I parsed it correctly my lava 1.0 file has 54
The man page specifies (is this the same for 6.1?)
Code:
     1  Version number (%s)
     2  Event time (%d)
     3  jobId (%d)
     4  userId (%d)
     5  options (%d)
     6  numProcessors (%d)
     7  submitTime (%d)
     8  beginTime (%d)
     9  termTime (%d)
    10  sigValue (%d)
    11  chkpntPeriod (%d)
    12  restartPid (%d)
    13  userName (%s)
    14  rLimits
    15  rLimits
    16  rLimits
    17  rLimits
    18  rLimits
    19  rLimits
    20  rLimits
    21  rLimits
    22  rLimits
    23  rLimits
    24  rLimits
    25  hostSpec (%s)
    26  hostFactor (%f)
    27  umask (%d)
    28  queue (%s)
    29  resReq (%s)
    30  fromHost (%s)
    31  cwd (%s)
    32  chkpntDir (%s)
    33  inFile (%s)
    34  outFile (%s)
    35  errFile (%s)
    36  subHomeDir (%s)
    37  jobFile (%s)
    38  numAskedHosts (%d)
    39  askedHosts (%s)
    40  dependCond (%s)
    41  preExecCmd (%s)
    42  jobName (%s)
    43  command (%s)
    44  nxf (%d)
    45  xf (%s)
    46  mailUser (%s)
    47  projectName (%s)
    48  niosPort (%d)
    49  maxNumProcessors (%d)
    50  schedHostType (%s)
    51  loginShell (%s)
    52  userGroup (%s)
    53  options2 (%d)
    54  idx (%d)
    55  inFileSpool (%s)
    56  commandSpool (%s)
    57  jobSpoolDir (%s)
    58  userPriority (%d)
So yeah you found a bug, I guess.
For LAVA 1.0, all the parameters documented in man page of lsb.events will be logged in the file except the "52 userGroup (%s)". This may be a problem.

If you want to parse the events information, you should be clear on that some parameters documented in man page do not be there always. They only be logged with the conditions on.
For example:
"39 askedHosts (%s)" will be there only when "38 numAskedHosts (%d)" is more than 0.
It is the same as " xf (%s)". It will be logged when "nxf (%d)" is more than 0.
The other one is "niosPort (%d)".

And you should be awared that all the things in the quotaions (a string) are considered one item.

BTW, the function which writes the JOBNEW log to events file is writeJobNew() in lsb.log.c, and you can find the other functions on events file in lsb.log.c. I think the code will give you more help.

I think lava 1.0 is similar with Platform lava6.1
Reply With Quote
  #6 (permalink)  
Old October 6th, 2008, 07:19 AM
Member
 
Join Date: September 16th, 2008
Location: Germany
Posts: 42
Default

Thanks alot qlnie, that is not too much fun to parse... And as far as I can see this pice of information is missing from the manpage.

Code:
       nxf (%d)

              Number of files to transfer (%d)

       xf (%s)

              List of file transfer specifications
Reply With Quote
  #7 (permalink)  
Old October 8th, 2008, 07:42 AM
Junior Member
 
Join Date: September 25th, 2008
Posts: 11
Default

Thank you very much, qlnie and _fmms_!

We have tried padding the missing parameters. However, the log entries seem to be strange for not having fewer but more parameters.

Here are two samples of JOB_NEW and JOB_STATUS:
http://ducta.net/sfsu/csc899/doc/lsb...1_sample01.pdf

We would appreciate if you will take a look!

And where is "lsb.log.c" located, please?

Thanks,
Duke
Reply With Quote
  #8 (permalink)  
Old October 8th, 2008, 08:22 AM
Member
 
Join Date: September 16th, 2008
Location: Germany
Posts: 42
Default

You can find lsb.log.c at /trunk/src/kits/lava/packages/lava/lsbatch/lib/lsb.log.c - Platform Open Cluster Stack - Trac
Reply With Quote
  #9 (permalink)  
Old October 9th, 2008, 10:10 AM
Junior Member
 
Join Date: September 25th, 2008
Posts: 11
Default

Thanks, _fmms_!

And please excuse us for having so many questions. We are new to the application.

1. Where can we find a demo of Platform Lava 6.1 GUI?

- We found this site but had no access: http://teracluster.icss.neu.edu:8080/Platform/
- They also some good interfaces listed on: Teracluster Cluster

2. Our group is investigating the logs to collect information. However, they seem to be unstable. Thus, we are looking for a different approach. Is it possible to make Platform Lava write logs directly to a database so that records will be stored more properly?

3. If we are to alter the programs like ""lsb.log.c", will that affect other parts of the Platform Lava rather the "lsb.events" logs?

Thanks,
Duke
Reply With Quote
  #10 (permalink)  
Old October 9th, 2008, 10:34 AM
Member
 
Join Date: September 16th, 2008
Location: Germany
Posts: 42
Default

1.
I do not know where to find a demo but you may download it with http://my.platform.com/products/plat..._64.disk1.iso/

2.
I read LSF does logging into a database, but lava is not able to. I did not understand yet how to use the logs, there are some information going into files and others printed to stderr when setting some DEBUG variables in lsf.conf. There are other messages in the source where I could never trace where they and up... (Checkpointing/Resume debug messages)

3.
no idea.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Forum Jump


All times are GMT. The time now is 01:12 PM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.