-
July 11th, 2008 06:12 AM #1
bsub failing miserably
Originally posted by: hazards, Tue Nov 06, 2007 7:35 pm
I gave up trying to compile NAMD and downloaded AMD64 executables from the NAMD web site so that I could work with something
When I try to use LAVA
eg bsub -n 10,16 NAMD
Then look at the LAVA gui the "details" for the job are :
Job <236>, User <hazards>, Project <default>, Status <Exited>, Queue <normal>, Command <NAMD_APOA1_LAVA.sh>
Tue Nov 06 14:11:35 Submitted from host <hpcc2-head1>,
CWD <$HOME/NAMD_LAVA> , 10-16 Processors Requested ;
Tue Nov 06 14:11:39 Started on 10 Hosts/Processors <compute-0-15> <compute-0-15> <compute-0-7> <compute-0-7> <compute-0-3> <compute-0-3> <compute-0-10> <compute-0-10> <compute-0-2> <compute-0-2> ,
Execution Home </home/hazards>, Execution CWD </home/hazards/NAMD_LAVA>;
Tue Nov 06 14:11:39 Exited with exit code 127. The CPU time used is 0.0 seconds.
Here's the shell
#!/bin/bash
# request Bourne shell as shell for job
#$ -S /bin/bash
#set up lava
#BSUB -o lava1_out.log
#BSUB -q normal
lava1 /share/apps/NAMD/charmrun ++verbose /share/apps/NAMD/namd2 +p16 /home/hazards/NAMD_trials/apoa1.namd > charm_namd_lava1.out
#BSUB -J lava1
^D
Whereas, the core application line starts like this
Charmrun> charmrun started...
Charmrun> using /home/hazards/.nodelist as nodesfile
Charmrun> adding client 0: "localhost", IP:127.0.0.1
Charmrun> adding client 1: "localhost", IP:127.0.0.1
Charmrun> adding client 2: "localhost", IP:127.0.0.1
Charmrun> adding client 3: "localhost", IP:127.0.0.1
Charmrun> adding client 4: "localhost", IP:127.0.0.1
Charmrun> adding client 5: "localhost", IP:127.0.0.1
Charmrun> adding client 6: "localhost", IP:127.0.0.1
Charmrun> adding client 7: "localhost", IP:127.0.0.1
Charmrun> adding client 8: "localhost", IP:127.0.0.1
Charmrun> adding client 9: "localhost", IP:127.0.0.1
Charmrun> adding client 10: "localhost", IP:127.0.0.1
Charmrun> adding client 11: "localhost", IP:127.0.0.1
Charmrun> adding client 12: "localhost", IP:127.0.0.1
Charmrun> adding client 13: "localhost", IP:127.0.0.1
Charmrun> adding client 14: "localhost", IP:127.0.0.1
Charmrun> adding client 15: "localhost", IP:127.0.0.1
Charmrun> Charmrun = 128.23.191.115, port = 38072
Charmrun> Sending "0 128.23.191.115 38072 14601 0" to client 0.
Charmrun> find the node program "/share/apps/NAMD/namd2" at "/home/hazards/NAMD_LAVA" for 0.
Charmrun> Starting rsh localhost -l hazards /bin/sh -f
Charmrun> rsh (localhost:0) started
Charmrun> Sending "1 128.23.191.115 38072 14601 0" to client 1.
then produces a blizzard of these lines
localhost.localdomain: Connection refused
Charmrun> Error 1 returned from rsh (localhost:0)
localhost.localdomain: Connection refused
The log file never gets anything.
This namd program may not be working properly but neither does another program called MrBayes which does work from command line produces the same 127 error when submitted via a bsub command.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules