Note: This content is a copy from my blog

I want to share my experience on setting up Platform Lava workload scheduler on a Linux Cluster.

In this example, our cluster is composed of a Lava master node (named master) which is for users submitting their jobs on the scheduler and dispatch the jobs to run on 2 Lava slave nodes (compute1 and compute2).

The machines are connected through ethernet switch in the same network. (master: 10.1.1.1, compute1-compute2: 10.1.1.11 and 10.1.1.12 consequently)
We'll learn how to setup a Lava cluster on this set of computers, assume that Redhat Linux is installed on every nodes.

1. Compile & Install Lava

We'll do this task on every nodes in the cluster. First, download Lava source code and extract files from the zip, you'll see source code directories like this:

Code:
# tar zxvf Lava-src-1.0.6.tar.gz
# ls Lava
chkpnt    config.h  doc     lava.spec  lsf       Makefile     README
config    COPYING   eauth  lsbatch    Make.def  Make.misc  scripts
Then we build the sourcode and install by the following commands:

Code:
# cd Lava
# make
# make install
This will install Lava binary files and associated man pages on the machine.

There will be more files to be installed manually:

MPI wrappers
Code:
# cp Lava/scripts/mpich-mpirun /usr/bin/
# cp Lava/scripts/openmpi-mpirun /usr/bin/
# cp Lava/scripts/lam-mpirun /usr/bin/
# cp Lava/scripts/mpich2-mpiexec /usr/bin/
# ln -s /usr/bin/mpich-mpirun /usr/bin/mvapich1-mpirun
# ln -s /usr/bin/mpich2-mpiexec /usr/bin/mvapich2-mpiexec
Put Lava initial script on /etc/init.d/ and make it execute when booting the system

Code:
# cp Lava/scripts/lava.init /etc/init.d/lava
# /sbin/chkconfig --add lava
# /sbin/chkconfig lava on
2. Configure Lava on Master node

Lava source code provides some initial configuration file templates in the config directory, but there are some files needed are missing which we have to create them ourselves. For the simple usage, we didn't recommend to change the configuration files that come from source code.

2.1 Preparing environment

First, we have to prepare the configuration path and running environment for Lava cluster.
Putting the enviroment fle from the source code in /etc/profile.d to be execute when users login.

Code:
# cp Lava/config/lava.* /etc/profile.d
One important parameter is LSF_ENVDIR which point to lava's configuration path

Create a user 'lavaadmin' for executing and administrating the Lava cluster

Code:
# useradd -c "Lava Administrator" -s /sbin/nologin -m -d /home/lavaadmin lavaadmin
2.2 Setup LSF cluster configuration files

Lava cluster configuration path is at /etc/lava/conf (defined in LSF_ENVDIR)
Create configuration path and set the owner to lavaadmin.

Code:
# mkdir -p /etc/lava/conf
# cp Lava/config/lsf* /etc/lava/conf
# chown -R lavaadmin:lavaadmin /etc/lava/conf
lava.tasks and lava.shared define clusters parameters such as cluster name (default is lava), host types and resources

Next, we create lsf.conf general configuration file of the lava system, copy the following content and change hostname and domain name of the master node in the last paragraph

Code:
# /etc/lava/conf/lsf.conf
# for master nodes

# Refer to the Inside Platform Lava documentation
# before changing any parameters in this file.
# Any changes to the path names of Lava files must be reflected
# in this file. Make these changes with caution.

LSB_SHAREDIR=/var/spool/lava/work

# Configuration directories
LSF_CONFDIR=/etc/lava/conf
LSB_CONFDIR=/etc/lava/conf/lsbatch

# Daemon log messages
LSF_LOGDIR=/var/log/lava
LSF_LOG_MASK=LOG_WARNING

# Miscellaneous
LSF_AUTH=eauth

# General cwinstall variables
LSF_MANDIR=/usr/share/man
LSF_INCLUDEDIR=/usr/include
LSF_MISC=/usr/share/lava/misc
XLSF_APPDIR=/usr/share/lava/misc
LSF_ENVDIR=/etc/lava/conf

# Internal variable to distinguish Default Install
LSF_DEFAULT_INSTALL=y

# Internal variable indicating operation mode
LSB_MODE=batch

# WARNING: Please do not delete/modify next line!!
LSF_LINK_PATH=n

# LSF_MACHDEP and LSF_INDEP are reserved to maintain
# backward compatibility with legacy lsfsetup.
# They are not used in the new cwinstall.
#LSF_INDEP=/usr
#LSF_MACHDEP=/opt/lava/1.0

LSF_TOP=/usr
LSF_VERSION=1.0
LSF_MASTER_LIST=master
LSF_STRIP_DOMAIN=.mydomain.com
LSB_MAILSERVER=SMTP:master.mydomain.com
LSB_MAILTO=!U@master.mydomain.com
LSF_RSH=ssh
We'll see that the worikng directory (LSB_SHAREDIR) parameter is set to /var/spool/lava/work
We have to create that working directory and set it accessable from lavaadmin user.

Code:
# mkdir -p /var/spool/lava/work
# chown lavaadmin:lavaadmin /var/spool/lava/work
We have to list all the nodes to be computed in the cluster for Lava, create hosts file that map IP address and the name of each node. E.g.
Code:
# /etc/lava/conf/hosts

10.1.1.1    master
10.1.1.11    compute1
10.1.1.12    compute2
Then we define nodes and properties in the cluster by creating file named lsf.cluster.lava (or lsf.cluster.clustername), and add the nodes in 'Host' section.

Code:
# /etc/lava/conf/lsf.cluster.lava

Begin   ClusterAdmins
Administrators = lavaadmin
End    ClusterAdmins

Begin   Host
HOSTNAME  model    type        server r1m  mem  swp  RESOURCES    #Keywords
master !       !       1       3.5 ()   ()   ()
compute1     !       !       1     3.5 ()    ()    ()
compute2     !       !       1     3.5 ()    ()    ()
End     Host

Begin Parameters
End Parameters
See lsf.cluster(5) for more information and parameters

2.3 Setup LSBATCH configuration files

LAVA batch scheduler configuration locate at /etc/lava/conf/lsbatch/clustername/configdir (default: /etc/lava/conf/lsbatch/lava/configdir)

Code:
# mkdir -p /etc/lava/conf/lsbatch/lava/configdir/
# cp Lava/config/lsb* /etc/lava/conf/lsbatch/lava/configdir/
which contains:
lsb.params define parameters for Lava batch system
lsb.module define plugin modules Lava batch system
lsb.queue define batch queues and characteristics for Lava batch system (predefine queues: normal, priority, short, chkpnt_rerun_queue)
lsb.users define user groups to use the Lava batch system

Next, we need to indecate the nodes that will run Lava batch by creating file lsb.hosts

Code:
# /etc/lava/conf/lsbatch/lava/configdir/lsb.hosts

Begin Host
HOST_NAME MXJ   r1m     pg    ls    tmp  DISPATCH_WINDOW  # Keywords
default    !    ()      ()    ()     ()     ()             # Example
master 0       ()         ()      ()   ()   ()
End Host
Note: The default value on HOST_NAME field is represent all the nodes in the clusters defined in lsf.cluster.lava
Note: We set MXJ field (Maximum Job) on the master node to 0 to avoid job assignment on the master node.
Note: If there is no nodes defined in the Host section, Lava batch will be applied on all nodes in the cluster.

3. Configure Lava on slave nodes

Compile Lava source code and install the binary files on slave nodes can be done by repeating step 1 in the tutorial.
The slave nodes need less configuration files than the master, we to define the cluster to be known by Lava.

3.1 Preparing environment

Put the environment file in /etc/profile.d, create lavaadmin user, create configuration path and working directory

Code:
# cp Lava/config/lava.* /etc/profile.d
# useradd -c "Lava Administrator" -s /sbin/nologin -m -d /home/lavaadmin lavaadmin
# mkdir -p /etc/lava/conf
# mkdir -p /var/spool/lava/work
# chown lavaadmin:lavaadmin /etc/lava/conf /var/spool/lava/work
3.2 Create Configuration Files

Some configuration files in slave nodes contain the same content as master node, says lsf.cluster.lava and hosts, that define which machines are the nodes of the cluster. We can just copy the files from master node to the configuration path and make sure they has the same content when we edit the files on the master node.

Code:
# scp 10.1.1.1:/etc/lava/conf/hosts /etc/lava/conf/
# scp 10.1.1.1:/etc/lava/conf/lsf.cluster.lava /etc/lava/conf/
Next is lsf.conf which has slightly different from the one on master node,

Code:
# /etc/lava/conf/lsf/conf/lsf.conf
# for slave nodes

LSB_SHAREDIR=/var/spool/lava/work

# Configuration directives
LSF_CONFDIR=/etc/lava/conf
LSB_CONFDIR=/etc/lava/conf/lsbatch

LSF_MASTER_LIST=lava

# Daemon log messages
LSF_LOGDIR=/var/log/lava
LSF_LOG_MASK=LOG_WARNING

LSF_STRIP_DOMAIN=.hpcnc.com
LSB_MAILTO=!U@lava.hpcnc.com
LSB_MAILSERVER=SMTP:lava.hpcnc.com

LSF_AUTH=eauth

# General variables
LSF_ENVDIR=/etc/lava/conf

# Internal variable to distinguish Default Install
LSF_DEFAULT_INSTALL=y

# Internal variable indicating operation mode
LSB_MODE=batch

# WARNING: Please do not delete/modify next line!!
LSF_LINK_PATH=n

LSF_TOP=/usr
4. Start the Lava cluster

After complete configuring on master and all slaves node in step 3. We'll start the daemon on each node.

On slave nodes
Log in to slave node and start lava daemon by running the command:

Code:
# /etc/init.d/lava start
Note: this command is just run on the first time. When the machines reboot, lava daemon will start automatically by chkconfig command on step 1.

Then start batch deamon on slave node

Code:
# badmin hstartup
On master node
Start lava daemon on master node as well.

Code:
# /etc/init.d/lava start
Then make the lava daemon to read the configuration we provided in section 3.
Code:
# lsadmin reconfig
# badmin reconfig
Then start the batch daemon
Code:
# badmin hstartup
We can verify the configuration by checking the lava cluster's nodes by lshosts and bhosts commands

Code:
# lshosts
HOST_NAME      type    model  cpuf ncpus maxmem maxswp server RESOURCES
master        LINUX86        PC 100.0      1     2026M     996M     Yes ()
compute1         LINUX86        PC 100.0      1     2026M     996M     Yes ()
compute2         LINUX86        PC 100.0      1     2026M     996M     Yes ()

# bhosts
HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
compute2               ok               -           1        0         0         0          0        0
compute1               ok               -           1        0         0         0          0        0
master                  closed          -           0        0         0         0          0        0
If the results show the list of all nodes and configurations that we defined previouls, it's a good sign that the cluster is running without problem.


For more information about configuring Lava cluster please see the document at Understanding Platform Lava

Feel free to ask questions/comment at Lava forum on HPCCommunity


Good day