Hi,

We have cluster setup with provisioning network 10.1.7.0/24 and infiniband network 10.1.39.0/24... What happens is that the infiniband interfaces are listed first in the /etc/lava/conf/hosts file, since the Lava Master does not have infiniband it cannot communicate with the compute nodes.

The patch below fix the problem in our case, but it is NOT fail proof. The LSF/Lava Master might not be on the provisioning/boot network of the compute nodes in all setups.

A better way to fix this would be to have an option where the user can specify what interface to use/add to /etc/lava/conf/hosts on a nodegroup basis.

Code:
--- lavahosts_1_0.py.backup     2009-10-08 14:17:40.000000000 +0200
+++ lavahosts_1_0.py    2009-10-08 14:26:35.000000000 +0200
@@ -82,7 +82,7 @@
                domain = self.db.getAppglobals('DNSZone')
                query = ('select nics.ip,nodes.name,networks.suffix '
                                'from nics,nodes,networks where nics.nid = nodes.nid '
-                               'and nics.netid = networks.netid and networks.usingdhcp=False and not networks.device="bmc" order by nics.ip')
+                               'and nics.netid = networks.netid and networks.usingdhcp=False and not networks.device="bmc" and nics.boot = True order by nics.ip')

                try:
                     self.db.execute(query)