![]() | ||||
| ||||||
| Lava Support Get answers to your Lava issues from the community. |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Display Modes |
| |||
|
Hello, My university's research lab is running Platform Lava 6.1: -bash-3.00$ lsid Platform Lava 6.1, May 5 2005 Copyright 1992-2004 Platform Computing Corporation My cluster name is lava My master name is ccls -bash-3.00$ on a Dell HPC Cluster (Dell PowerEdge Cluster, Intel(R) Xeon(TM) CPU 3.00GHz). We need to do analysis on the log file "lsb.events" to get information about the status of the submitted jobs. However, there are serious mismatches found when comparing the number of the parameters and the data types of the parameters in the log file against how they are documented in "lsb.events" man pages. For example: - JOB_NEW is documented to have 56 parameters. But JON_NEW record lines of 59 and 60 parameters are found in the log. - JOB_STATUS, 12. But 12 and 31 found. ... What could be the cause of this problem? And how to fix it please? Thanks in advance, Duke |
| |||
|
If I parsed it correctly my lava 1.0 file has 54 Code: 1 "1.0"
2 1223460002
3 18894
4 500
5 33564675
6 1
7 1223460002
8 0
9 0
10 -65535
11 0
12 17157
13 "lavaadmin"
14 -1
15 -1
16 -1
17 -1
18 -1
19 -1
20 -1
21 -1
22 -1
23 -1
24 -1
25 ""
26 100.00
27 2
28 "normal"
29 ""
30 "master52"
31 "/tmp"
32 "/tmp//18893"
33 ""
34 ""
35 ""
36 "/home/lavaadmin"
37 "1223459244.18893.18894"
38 0
39 ""
40 ""
41 "sleep 1000"
42 "sleep 1000"
43 0
44 ""
45 "default"
46 1
47 "LINUX86"
48 ""
49 16
50 0
51 ""
52 ""
53 ""
54 -1
Code: 1 Version number (%s)
2 Event time (%d)
3 jobId (%d)
4 userId (%d)
5 options (%d)
6 numProcessors (%d)
7 submitTime (%d)
8 beginTime (%d)
9 termTime (%d)
10 sigValue (%d)
11 chkpntPeriod (%d)
12 restartPid (%d)
13 userName (%s)
14 rLimits
15 rLimits
16 rLimits
17 rLimits
18 rLimits
19 rLimits
20 rLimits
21 rLimits
22 rLimits
23 rLimits
24 rLimits
25 hostSpec (%s)
26 hostFactor (%f)
27 umask (%d)
28 queue (%s)
29 resReq (%s)
30 fromHost (%s)
31 cwd (%s)
32 chkpntDir (%s)
33 inFile (%s)
34 outFile (%s)
35 errFile (%s)
36 subHomeDir (%s)
37 jobFile (%s)
38 numAskedHosts (%d)
39 askedHosts (%s)
40 dependCond (%s)
41 preExecCmd (%s)
42 jobName (%s)
43 command (%s)
44 nxf (%d)
45 xf (%s)
46 mailUser (%s)
47 projectName (%s)
48 niosPort (%d)
49 maxNumProcessors (%d)
50 schedHostType (%s)
51 loginShell (%s)
52 userGroup (%s)
53 options2 (%d)
54 idx (%d)
55 inFileSpool (%s)
56 commandSpool (%s)
57 jobSpoolDir (%s)
58 userPriority (%d)
Last edited by _fmms_; October 2nd, 2008 at 03:06 PM.. |
| |||
|
Thank you, _fmms_! Here is the entries for JOB_NEW in my lsb.events 6.1, 56 parameters: 0. JOB_NEW 1. Version number (%s) 2. Event time (%d) 3. jobId (%d) 4. userId (%d) 5. options (%d) 6. numProcessors (%d) 7. submitTime (%d) 8. beginTime (%d) 9. termTime (%d) 10. sigValue (%d) 11. chkpntPeriod (%d) 12. restartPid (%d) 13. userName (%s) 14. rLimits 15. rLimits 16. rLimits 17. rLimits 18. rLimits 19. rLimits 20. rLimits 21. rLimits 22. rLimits 23. rLimits 24. rLimits 25. hostSpec (%s) 26. hostFactor (%f) 27. umask (%d) 28. queue (%s) 29. resReq (%s) 30. fromHost (%s) 31. cwd (%s) 32. chkpntDir (%s) 33. inFile (%s) 34. outFile (%s) 35. errFile (%s) 36. subHomeDir (%s) 37. jobFile (%s) 38. numAskedHosts (%d) 39. askedHosts (%s) 40. dependCond (%s) 41. preExecCmd (%s) 42. timeEvent (%d) 43. jobName (%s) 44. command (%s) 45. nxf (%d) 46. xf (%s) 47. mailUser (%s) 48. projectName (%s) 49. niosPort (%d) 50. maxNumProcessors (%d) 51. schedHostType (%s) 52. loginShell (%s) 53. exceptList (%s) 54. options2 (%d) 55. userPriority (%d) 56. extsched (%s) Regards, Duke |
| |||
| Quote:
If you want to parse the events information, you should be clear on that some parameters documented in man page do not be there always. They only be logged with the conditions on. For example: "39 askedHosts (%s)" will be there only when "38 numAskedHosts (%d)" is more than 0. It is the same as " xf (%s)". It will be logged when "nxf (%d)" is more than 0. The other one is "niosPort (%d)". And you should be awared that all the things in the quotaions (a string) are considered one item. BTW, the function which writes the JOBNEW log to events file is writeJobNew() in lsb.log.c, and you can find the other functions on events file in lsb.log.c. I think the code will give you more help. I think lava 1.0 is similar with Platform lava6.1 |
| |||
|
Thanks alot qlnie, that is not too much fun to parse... And as far as I can see this pice of information is missing from the manpage. Code: nxf (%d)
Number of files to transfer (%d)
xf (%s)
List of file transfer specifications
|
| |||
|
Thank you very much, qlnie and _fmms_! We have tried padding the missing parameters. However, the log entries seem to be strange for not having fewer but more parameters. Here are two samples of JOB_NEW and JOB_STATUS: http://ducta.net/sfsu/csc899/doc/lsb...1_sample01.pdf We would appreciate if you will take a look! And where is "lsb.log.c" located, please? Thanks, Duke |
| |||
|
You can find lsb.log.c at /trunk/src/kits/lava/packages/lava/lsbatch/lib/lsb.log.c - Platform Open Cluster Stack - Trac |
| |||
|
Thanks, _fmms_! And please excuse us for having so many questions. We are new to the application. 1. Where can we find a demo of Platform Lava 6.1 GUI? - We found this site but had no access: http://teracluster.icss.neu.edu:8080/Platform/ - They also some good interfaces listed on: Teracluster Cluster 2. Our group is investigating the logs to collect information. However, they seem to be unstable. Thus, we are looking for a different approach. Is it possible to make Platform Lava write logs directly to a database so that records will be stored more properly? 3. If we are to alter the programs like ""lsb.log.c", will that affect other parts of the Platform Lava rather the "lsb.events" logs? Thanks, Duke |
| |||
|
1. I do not know where to find a demo but you may download it with http://my.platform.com/products/plat..._64.disk1.iso/ 2. I read LSF does logging into a database, but lava is not able to. I did not understand yet how to use the logs, there are some information going into files and others printed to stderr when setting some DEBUG variables in lsf.conf. There are other messages in the source where I could never trace where they and up... (Checkpointing/Resume debug messages) 3. no idea. |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|