-
July 11th, 2008 05:35 AM #1
R and R-mpi
Originally posted by: mbozzore, Sun Apr 08, 2007 5:40 am
This topic is for R / R-mpi : how to recompile everything on your OCS cluster and then submit R-mpi jobs using LSF HPC.
_________________
Mehdi Bozzo-Rey
-
July 11th, 2008 05:36 AM #2
1st step : recompiling R
Originally posted by: mbozzore, Sun Apr 08, 2007 6:16 am
Recompiling R ( The R Project for Statistical Computing ) is pretty easy.
Just get the latest tarball and :
1- Configure
./configure --prefix=/share/apps/CN/R-2.4.1
2- Make the distribution
make
3- Install the distribution
sudo make install
_________________
Mehdi Bozzo-Rey
Last edited by admin_ocs4; July 11th, 2008 at 05:45 AM.
-
July 11th, 2008 05:39 AM #3
2nd step : recompiling lam-mpi
Originally posted by: mbozzore, Sun Apr 08, 2007 6:19 am
Here is the first trick, the shared version of lam is needed for R-mpi.
You can use the following procedure, if your OCS cluster has InfiniBand interconnect :
1- Configure lam :
./configure --enable-shared --with-rpi=ib --with-rsh="ssh -x" --with-rpi-ib=/usr/local/topspin --prefix=/share/apps/CN/lam-7.1.2-ib-shared
2- Make the distribution :
make
3- Install the distribution :
sudo make install
_________________
Mehdi Bozzo-Rey
Last edited by admin_ocs4; July 11th, 2008 at 05:48 AM.
-
July 11th, 2008 05:41 AM #4
Third step : R-mpi installation and test
Originally posted by: mbozzore, Sun Apr 08, 2007 6:33 am
Here is the complete procedure for installing R-mpi and running some examples on your platform OCS cluster:
1- install the Rmpi package :
R CMD INSTALL Rmpi_0.5-3.tar.gz --configure-args=--with-mpi=/share/apps/CN/lam-7.1.2-ib-shared
I did add the following in my .bashrc :
export LD_LIBRARY_PATH=/share/apps/CN/R-2.4.1/lib64/R/lib/Rmpi/libs/:$LD_LIBRARY_PATH
2- Test the Rmpi package (outside of LSF HPC):
2.1- log on one compute node (for IB support, if the frontend does not have a HCA)
ssh compute-0-0
2.2- Boot the lam universe
lamboot -v hosts
The file hosts contains the following :
------------------------
compute-0-0
compute-0-1
-----------------------
And here is the tricky part :
[mbozzore@compute-0-0 ~]$ ssh compute-0-1 "R --slave"
WARNING: ignoring environment value of R_HOME
LAM/MPI runtime environment is not operating.
Starting LAM/MPI runtime environment.
Error in dyn.load(x, as.logical(local), as.logical(now)) :
unable to load shared library '/share/apps/CN/R-2.4.1/lib64/R/lib/Rmpi/libs/Rmpi.so':
/opt/lam/gnu/lib/liblam.so.0: undefined symbol: openpty
Error in dyn.unload(x) : dynamic/shared library '/share/apps/CN/R-2.4.1/lib64/R/lib/Rmpi/libs/Rmpi.so' was not loaded
Warning message:
Rmpi cannot be loaded
Killed by signal 2.
The solution is to add the following in .bashrc :
export LD_PRELOAD=/lib64/libutil-2.3.4.so
The reference for this issue is there :
http://www.lam-mpi.org/MailArchives/lam/2006/05/12378.php
Then, going through the same procedure :
-----------------------------------
[mbozzore@compute-0-0 ~]$ ssh compute-0-1 "R --slave"
WARNING: ignoring environment value of R_HOME
3+8
[1] 11
[mbozzore@compute-0-0 ~]$ R
WARNING: ignoring environment value of R_HOME
R version 2.4.1 (2006-12-18)
Copyright (C) 2006 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
LAM/MPI runtime environment is not operating.
Starting LAM/MPI runtime environment.
> mpi.spawn.Rslaves()
1 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 2 is running on: compute-0-0
slave1 (rank 1, comm 1) of size 2 is running on: compute-0-0
> mpi.remote.exec(mpi.get.processor.name())
$slave1
[1] "compute-0-0"
> demo("simplePI")
demo(simplePI)
---- ~~~~~~~~
Type <Return> to start :
> simple.pi <- function(n, comm = 1) {
mpi.bcast.cmd(n <- mpi.bcast(integer(1), type = 1, comm = .comm),
comm = comm)
mpi.bcast(as.integer(n), type = 1, comm = comm)
mpi.bcast.cmd(id <- mpi.comm.rank(.comm), comm = comm)
mpi.bc .... [TRUNCATED]
> simple.pi(1000000)
[1] 3.141593
-----------------------------------
And finally, using a R-mpi example available here :
Rmpi Examples
mpirun -np 1 R --slave CMD BATCH task_pull.R
will produce the folowing files :
task_pull.Rout
Rplots.ps
_________________
Mehdi Bozzo-Rey
Last edited by admin_ocs4; July 11th, 2008 at 05:48 AM.
-
July 11th, 2008 05:50 AM #5
Final step : running R-mpi inside of LSF HPC
Originally posted by: mbozzore, Sun Apr 08, 2007 6:40 am
As you can run any R-mpi task using mpirun, the LSF HPC part is very easy. You can use for example :
bsub -o%J.out -n 4 -a lammpi mpirun.lsf R --slave CMD BATCH task_pull.R
And below is my job output :
Sender: LSF System <lsfadmin@compute-0-1>
Subject: Job 4310: <mpirun.lsf R --slave CMD BATCH task_pull.R> Done
Job <mpirun.lsf R --slave CMD BATCH task_pull.R> was submitted from host <dr10> by user <mbozzore>.
Job was executed on host(s) <2*compute-0-1>, in queue <hpc_linux>, as user <mbozzore>.
<2*compute-0-0>
</home/mbozzore> was used as the home directory.
</home/mbozzore/compile_temp> was used as the working directory.
Started at Thu Feb 22 15:45:42 2007
Results reported at Thu Feb 22 15:45:55 2007
Your job looked like:
------------------------------------------------------------
# LSBATCH: User input
mpirun.lsf R --slave CMD BATCH task_pull.R
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 22.89 sec.
Max Memory : 2 MB
Max Swap : 13 MB
Max Processes : 1
Max Threads : 1
The output (if any) follows:
2476 /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin/TaskStarter running on n0 (o)
2477 /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin/TaskStarter running on n0 (o)
11116 /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin/TaskStarter running on n1
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
11117 /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin/TaskStarter running on n1
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
Job /opt/lsfhpc/6.2/linux2.6-glibc2.3-x86_64/bin/lammpirun_wrapper R --slave CMD BATCH task_pull.R
TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
===== ========== ================ ======================= ===================
00002 compute-0- R --slave CMD BA Done 02/22/2007 15:45:49
00003 compute-0- R --slave CMD BA Done 02/22/2007 15:45:49
00000 compute-0- R --slave CMD BA Done 02/22/2007 15:45:49
00001 compute-0- R --slave CMD BA Done 02/22/2007 15:45:49
_________________
Mehdi Bozzo-Rey
-
July 11th, 2008 05:55 AM #6
R and Rmpi on OCS platoform WITHOUT infiniband
Originally posted by: hazards, Thu Dec 20, 2007 3:24 pm
The install is in /share/apps/R
I ran through a series of examples from the manual and they all seemed to work
fine.
Here is what happens when I try to install Rmpi
[root@hpcc2-head1 COMPILE]# /share/apps/R/bin/R CMD INSTALL Rmpi_0.5-5.tar.gz
--configure-args=--with-mpi=/share/apps/lam-7.1.4
* Installing to library '/share/apps/R/lib64/R/library'
* Installing *source* package 'Rmpi' ...
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
Try to find mpi.h ...
Found in /share/apps/lam-7.1.4/include
Try to find libmpi.so or libmpich.a
checking for main in -lmpi... yes
Try to find liblam.so ...
checking for main in -llam... yes
checking for openpty in -lutil... yes
checking for main in -lpthread... yes
configure: creating ./config.status
config.status: creating src/Makevars
** libs
gcc -std=gnu99 -I/share/apps/R/lib64/R/include -I/share/apps/R/lib64/R/include
-DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -I/share/apps/lam-7.1.4/include
-DMPI2 -DLAM -fPIC -I/usr/local/include -fpic -g -O2 -c conversion.c -o
conversion.o
gcc -std=gnu99 -I/share/apps/R/lib64/R/include -I/share/apps/R/lib64/R/include
-DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -I/share/apps/lam-7.1.4/include
-DMPI2 -DLAM -fPIC -I/usr/local/include -fpic -g -O2 -c internal.c -o
internal.o
gcc -std=gnu99 -I/share/apps/R/lib64/R/include -I/share/apps/R/lib64/R/include
-DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -I/share/apps/lam-7.1.4/include
-DMPI2 -DLAM -fPIC -I/usr/local/include -fpic -g -O2 -c RegQuery.c -o
RegQuery.o
gcc -std=gnu99 -I/share/apps/R/lib64/R/include -I/share/apps/R/lib64/R/include
-DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\"
-DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -I/share/apps/lam-7.1.4/include
-DMPI2 -DLAM -fPIC -I/usr/local/include -fpic -g -O2 -c Rmpi.c -o Rmpi.o
gcc -std=gnu99 -shared -L/usr/local/lib64 -o Rmpi.so conversion.o internal.o
RegQuery.o Rmpi.o -L/share/apps/lam-7.1.4/lib -lmpi -llam -lutil -lpthread
-fPIC/usr/bin/ld: /share/apps/lam-7.1.4/lib/libmpi.a(cancel.o): relocation
R_X86_64_32 against `lam_mpi_comm_world' can not be used when making a shared
object; recompile with -fPIC
/share/apps/lam-7.1.4/lib/libmpi.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make: *** [Rmpi.so] Error 1
chmod: cannot access `/share/apps/R/lib64/R/library/Rmpi/libs/*': No such file
or directory
ERROR: compilation failed for package 'Rmpi'
** Removing '/share/apps/R/lib64/R/library/Rmpi'
I chased the fPIC error and someone suggested to recompile lam with
--enable-shared --disable-static
Here's what I tried in order to compile lam-7.1.4:
4534 2007-12-19 10:42:36 ./configure --prefix=/share/apps/lam-7.1.4 --with-rsh="ssh -x" --with-rpi-tcp-short=64000 --enable-shared --disable-static
4575 2007-12-19 12:41:11 ./configure --prefix=/share/apps/lam-7.1.4 --with-rsh="ssh -x" --with-rpi-tcp-short=64000 --enable-shared
4595 2007-12-19 12:56:19 ./configure --prefix=/share/apps/lam-7.1.4 --with-rsh="ssh -x" --with-rpi-tcp-short=64000
I had tried --with-rpi-tcp-short=64KB but the 'KB' caused it to refuse to compile.
In fact the --enable-shared and --enable-shared --disable-static both failed to compile lam so I went back using neither in order to get a working lam. The trouble with this "working" lam is that the lamtests do not all work :-( some do work :-)
I was wondering about openmpi but the Rmpi guy <http://www.stats.uwo.ca/faculty/yu/Rmpi/>
says "Under OpenMPI, R slaves will use 100% CPU time while waiting for master's instructions. This is mainly caused by OpenMPI's blocking call implementations. Under LAM or MPICH2, R slaves use 0% CPU time while waiting." which sounds ominous
Starr
-
July 11th, 2008 05:56 AM #7
Rmpi / openmpi
Originally posted by: mbozzore, Tue Jan 15, 2008 4:38 am
I checked the openmpi mailing list and from this post ( http://www.open-mpi.org/community/li...07/12/4721.php ) everything should be fine with the latest Rmpi:
--------------------------------------
On 18 December 2007 at 16:08, Randy Heiland wrote:
| The pkg in question is here: http://www.stats.uwo.ca/faculty/yu/Rmpi/
|
| The question is: has anyone on this list got OpenMPI working for
| this pkg? Any suggestions?
Yes -- I happen to maintain GNU R, a number of R packages (eg r-cran-*) and more for Debian and am also part of Debian's Open MPI maintainer group. I also use Rmpi at work.
Dr Yu and I sorted out all relevant issues a few weeks ago and the most current Rmpi (ie 0.5-5) works out of the box on Debian and Ubuntu, and is current in Debian. It should "just work" on any other recent Linux and Unix distro. If not please report back what configure reports and where it fails.
----------------------------------
I will test it.
Mehdi
_________________
Mehdi Bozzo-Rey
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
Forum Rules