-
July 11th, 2008 03:53 PM #1
Removing Nodes from Queue
Originally posted by: JPummill, Tue Mar 13, 2007 1:22 pm
I have just removed a number of compute nodes from one of our clusters for an extended experiment and now I need to remove the nodes from LSF on the master node so that they no longer show up as unavailable resources. Using LSF 6.2. What file(s) do I need to edit?
Thanks,
Jeff
-
July 11th, 2008 03:54 PM #2
Originally posted by: BillD, Tue Mar 13, 2007 6:36 pm
Jeff,
The "lsf.cluster.<clustername>" file should contain the list of all nodes in the cluster. You should remove the nodes from the list and make sure all LSF daemons are stopped on those nodes. You should then issue "lsadmin limrestart all" and "badmin reconfig". You should probably also make sure that LSF daemons are no longer started when these systems boot.
If you have referenced any of these nodes in various configuration files, you'll get a lot of warning messages. You should remove these references.
-
July 11th, 2008 03:54 PM #3
Originally posted by: JPummill, Wed Mar 14, 2007 1:03 pm
Morning Bill,
Edited the lsf.cluster.prospero file to remove all of the instances of the compute nodes that I removed. When I ran "lsadmin limrestart all", it still showed the nodes in the list as was the case when I ran badmin reconfig. No errors listed during either command.
In the /share/apps/LSF/conf directory, there are also the following files...
hosts
hostfile
lsf.conf
lsf.shared
...and a few others.
Using lsload, all of the nodes that I removed from the lsf.cluster.prospero file are still shown.
Where could it still be reading this info?
-Jeff
-
July 11th, 2008 03:55 PM #4
Originally posted by: JPummill, Wed Mar 14, 2007 5:48 pm
Hey Bill,
service lsf stop, then service lsf start did the trick. Cluster was idle, so nothing had any chance of being compromised or interrupted.
Thanks,
Jeff
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules