Showing posts with label rockscluster. Show all posts
Showing posts with label rockscluster. Show all posts

13 March 2012

106. htop 1.0.1 and sinfo-0.0.45 on rock 5.4.3/centos 5.6

There are a number of performance monitor tools in the debian repos. ROCKS 5.4.3/Centos doesn't seem quite as well-equipped.

First out, htop:

htop:
wget http://downloads.sourceforge.net/project/htop/htop/1.0.1/htop-1.0.1.tar.gz
tar -xvf htop-1.0.1.tar.gz
cd htop-1.0.1/
./configure --prefix=/home/me/.htop
make
make install

It's as simple as that.
Add e.g.
alias htop='/home/me/.htop/bin/htop'
to your ~/.bashrc
Note: this works on Scientific Linux (boron) 5.4 as well.

sinfo:
Update 13/03/2012:
Sinfo <0.0.44 has IPv6 enabled by default.
On sinfo >=0.0.45 you can disable IPv6 using ./configure --disable-IPv6

Sinfo is probably the snazziest cluster monitoring tool that I know of. Sure, ganglia etc. are nice too, but they run as web service. Sinfo is a 'simple' curses program, but building it on CentOS was a bit of a challenge.

Be aware that sinfo versions prior to 0.045 expect ipv6 to work -- by default ROCKS disables IPv6, so use sinfo 0.0.45 and above.





First boost:
(yum install boost-devel didn't do anything for me)
cd ~/tmp
wget http://sourceforge.net/projects/boost/files/boost/1.49.0/boost_1_49_0.tar.gz/download
tar -xvf boost_1_49_0.tar.gz
cd boost_1_49_0/
./bootstrap.sh --prefix=/usr

Edit Jamroot and add
using mpi ;
The space between mpi and ; is needed.

Symlink to your mpic++, e.g. if your mpic++ is in /opt/openmpi:
sudo ln -s /opt/openmpi/bin/mpic++ /usr/bin/mpic++

The following step takes a long time:
sudo ./b2 -a install --layout=versioned --build-type=complete

These days all the libboost libs are multithread aware (or so I hear), and in debian it turns out that the -mt.so libs are just symbolic links to the 'regular' libs.
sudo ln -s /usr/lib/libboost_signals.so /usr/lib/libboost_signals-mt.so
sudo ln -s /usr/lib/libboost_date_time.so /usr/lib/libboost_date_time-mt.so
sudo ln -s /usr/lib/libboost_serialization.so /usr/lib/libboost_serialization-mt.so
sudo ln -s /usr/lib/libboost_wserialization.so /usr/lib/libboost_wserialization-mt.so
sudo ln -s /usr/lib/libboost_regex.so /usr/lib/libboost_regex-mt.so

sudo ln -s /usr/lib/libboost_signals.so.1.49.0 /usr/lib64/libboost_signals.so.1.49.0

Then asio
cd ~/tmp
wget "http://downloads.sourceforge.net/project/asio/asio/1.5.3%20%28Development%29/asio-1.5.3.tar.bz2?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fasio%2F&ts=1331441086&use_mirror=aarnet"
tar -xvf asio-1.5.3.tar.bz2
cd asio-1.5.3/
./configure
make
sudo make install

Then sinfo
cd ~/tmp
wget http://www.ant.uni-bremen.de/whomes/rinas/sinfo/download/sinfo-0.0.45.tar.gz
tar -xvf sinfo-0.0.45.tar.gz
cd sinfo-0.0.45/
./configure --disable-IPv6

The build should be fine.

Configuration:
you'll end up with
/usr/local/sbin/sinfod
/usr/local/bin/sinfo
You may want to make sure there are paths to them by adding the following to your ~/.bashrc:
export PATH=$PATH:/usr/local/bin:/usr/local/sbin
The changes take effect next time you log in to a shell, or just run
source ~/.bashrc
for immediate effect.

Also, create a file called /etc/default/sinfo with the following in it:
OPTS="--quiet --bcastaddress=192.168.1.255"

Start sinfod with
sinfod --quiet --bcastaddress=192.168.1.255

then check that it's running
ps aux | grep sinfod

If it's not running, then try
sinfod -F

If it gives something along the lines of
exception:open:address family not supported
you most likely
1) haven't enabled ipv6 for your interface and
2) didn't disable IPv6 during compilation and/or
3) used version<0.045

Check by doing ifconfig -- does it return both an ipv4 and an ipv6 address?

Enabling ipv6
Unless you know what you're doing, don't fiddle with the network interfaces on a production cluster -- network interfaces on a multinode cluster are typically highly tuned to minimise latency, so don't mess it up.

Anyway. First check your /etc/modules.conf and - if present - comment out
alias ipv6 off
options ipv6 disable=1
Edit your /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1:0
IPADDR=192.168.1.111
NETMASK=255.255.255.0
BOOTPROTO=none
MTU=1500
TYPE=Ethernet
GATEWAY=192.168.1.1
USERCTL=no
IPV6INIT=yes
PEERDNS=yes
ONPARENT=yes
IPV6ADDR=fe80::2f0:4dff:f383:b44/64
IPV6_DEFAULTGW=fe80::2f0:4dff:fe83:a48/64
I just made up the IPV6ADDR, and took the IPV6_DEFAULTGW from my gateway machine (running debian, so ipv6 enabled by default)

Assuming that your firewall is allowing traffic at port 60003 and free traffic in and out on 192.168.1.255 things should work fine.



Errors


Error (boost):
MPI auto-detection failed: unknown wrapper compiler mpic++
Please report this error to the Boost mailing list: http://www.boost.org
You will need to manually configure MPI support.
Solution:
make sure you've symlinked to your mpic++ instance in /usr/bin
e.g. if your mpic++ is in /opt/openmpi/bin/mpic++
sudo ln -s /opt/openmpi/bin/mpic++ /usr/bin/mpic++


Error (sinfo):
message.cc: In member function 'void Message::popFrontMemory(void*, size_t)':
message.cc:183: error: 'memory' was not declared in this scope
message.cc:193: error: 'boost' has not been declared
message.cc:193: error: expected primary-expression before 'char'
message.cc:193: error: expected `;' before 'char'
message.cc:196: error: 'newMemory' was not declared in this scope
message.cc:196: error: 'memory' was not declared in this scope
make[2]: *** [message.lo] Error 1
make[2]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessage'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessage'
make: *** [all-recursive] Error 1
Solution:
You need to make sure that the libs are found -- either symlink manually between your build directory and /usr/lib, or use boostrap.sh --prefix=/usr. See above for how to do it.

Error (sinfo):
udpmessagereceiver.h:14: error: 'asio' has not been declared
udpmessagereceiver.h:14: error: ISO C++ forbids declaration of 'endpoint' with no type
udpmessagereceiver.h:14: error: expected ';' before 'sender_endpoint'
udpmessagereceiver.h:16: error: 'asio' has not been declared
udpmessagereceiver.h:16: error: ISO C++ forbids declaration of 'io_service' with no type
udpmessagereceiver.h:16: error: expected ';' before '&' token
udpmessagereceiver.h:17: error: 'asio' has not been declared
udpmessagereceiver.h:17: error: ISO C++ forbids declaration of 'socket' with no type
udpmessagereceiver.h:17: error: expected ';' before 'sock'
udpmessagereceiver.h:20: error: expected ',' or '...' before '::' token
udpmessagereceiver.h:20: error: ISO C++ forbids declaration of 'asio' with no type
udpmessagereceiver.h:23: error: 'asio' has not been declared
udpmessagereceiver.h:23: error: expected `)' before '&' token
udpmessagereceiver.cc:5: error: 'asio' has not been declared
udpmessagereceiver.cc:5: error: expected `)' before '&' token
make[1]: *** [udpmessagereceiver.lo] Error 1
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/libmessageio'
make: *** [all-recursive] Error 1

Solution: you've only got boost::asio installed, not the independent asio. See above for how to compile and install asio.

Error (sinfo):

/usr/bin/ld: cannot find -lboost_signals-mt
collect2: ld returned 1 exit status
make[2]: *** [sinfod] Error 1
make[2]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/sinfod'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/state/partition1/home/me/tmp/sinfo-0.0.44/sinfod'
make: *** [all-recursive] Error 1
Solution:
You need a symlink pointing form /usr/lib/libboost_signals-mt.so to /usr/lib/libboost_signals.so
ln -s /usr/lib/libboost_signals.so /usr/lib/libboost_signals-mt.so 

Error (sinfod):
sinfod --quiet --bcastaddress=192.168.1.255 gives nothing and sinfod exits silently immediately
sinfod -F gives
exception:open:address family not supported
Here's the relevant strace output:
[..]
 socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 6
[..]
 socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = -1 EAFNOSUPPORT (Address family not supported by protocol)
futex(0x333a40d350, FUTEX_WAKE_PRIVATE, 2147483647) = 0
close(6)                                = 0
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
write(2, "Exception: ", 11)             = 11
write(2, "open: Address family not support"..., 46) = 46
write(2, "\n", 1)                       = 1
exit_group(0)                           = ?

Solution: enable ipv6 (see above)

104. Building gromacs with fftw3 and openmpi on ROCKS 5.4.3/CentOS

This guide was heavily modified on 13/03/2012 to remove the need for sudo/root privileges.

Not all flavours of linux are equal. I've always been a Debian man, but have recently become a user of a ROCKS based HPC cluster on a different continent. To make sure that I don't screw things up I'm currently trying to work out how to reliably compile common computational packages under ROCKS 5.4.3, which is CentOS based.

If you installed the bio roll from the beginning you'll have openmpi in /opt/openmpi (rocks_openmpi.x86_64 package), and fftw in /opt/rocks/lib and /opts/rocks/include (fftw.x86_64 package)

If you only installed the basic rolls, you won't have either. Now, you can either download the bio roll and install from there, or you can install the regular openmpi package and compile fftw yourself. In fact, you'll need to do the latter if you want double-precision gromacs anyway.

My goal is to avoid having to use sudo or root at all. I've rewritten this guide a couple of times, so there may be weird annoying errors that I've missed.

Installing openmpi:
If you don't have openmpi in /opt, then you can install it from the base roll
sudo yum install openmpi


fftw3:
You can skip this step IF
1. you have fftw files in /opt/rocks/lib and /opt/rocks/include
AND
2. you only want single precision

Otherwise:

wget http://www.fftw.org/fftw-3.3.1.tar.gz
tar -xvf fftw-3.3.1.tar.gz
cd fftw-3.3.1


Then use --prefix to tell make where to install the files:

Single precision fftw3 libraries:
make distclean
./configure --enable-float --enable-mpi --enable-threads --with-pic --prefix=/export/home/me/.fftwsingle
make
make install

Double-precision fftw3 libraries:
make distclean
./configure --disable-float --enable-mpi --enable-threads --with-pic --prefix=/export/home/me/.fftwdouble
make 
make install

gromacs:

First download and extract:

cd ~/tmp
wget ftp://ftp.gromacs.org/pub/gromacs/gromacs-4.5.5.tar.gz

tar -xvf gromacs-4.5.5.tar.gz
cd gromacs-4.5.5/

Before building you need to define where the openmpi libs are i.e.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib
OR
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64/openmpi/1.4-gcc/lib

We now have three permutations of possible builds:
1. We use the single precision fftw libs in /opt/rocks/lib and /opt/rocks/include
export LDFLAGS=-L/opt/rocks/lib
export CPPFLAGS=-I/opt/rocks/include
./configure --enable-mpi --enable-float --with-fft=fftw3 --program-suffix=_spmpi --prefix=/export/home/me/gromacs
make
make install

2. We use the single precision fftw libs in /export/home/me/.fftwsingle

export LDFLAGS=-L/export/home/me/.fftwsingle/lib
export CPPFLAGS=-I/export/home/me/.fftwsingle/include
./configure --enable-mpi --enable-float --with-fft=fftw3 --program-suffix=_spmpi --prefix=/export/home/me/gromacs
make
make install

3. We use the double precision fftw libs in /export/home/me/.fftwdouble

export LDFLAGS=-L/export/home/me/.fftwdouble/lib
export CPPFLAGS=-I/export/home/me/.fftwdouble/include
./configure --enable-mpi --disable-float --with-fft=fftw3 --program-suffix=_ddmpi --prefix=/export/home/me/gromacs
make
make install


Running

You will now have single and double-precision binaries, e.g.
grompp_spmpi and grompp_ddmpi

Make sure that you define/have defined LD_LIBRARY_PATH in /etc/profile or ~/.bashrc and included the paths to your mpi libs and your fftw libs, e.g.:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib:/export/home/me/.fftwsingle:/export/home/me/.fftwdouble

Actually, it doesn't seem necessary to include the fftw path.

You may also want to include your gromacs bins in your path:
export PATH=$PATH:/export/home/me/gromacs/bin

Dynamic load-balancing seems to be disabled by default, so to use multiple cores run using e.g.
mpirun -n 4 mdrun_spmpi -s inp.tpr -o out.trr etc.

DONE


Troubleshooting

Error:
checking size of off_t... configure: error: in `/export/home/me/tmp/gromacs-4.5.5':
configure: error: cannot compute sizeof (off_t)
See `config.log' for more details
config.log:
./conftest: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No such file or directory
Solution:
Set LD_LIBRARY_PATH to your openmpi libs e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/openmpi/lib

Error:
/usr/local/lib/libfftw3f.a: could not read symbols: Bad value
collect2: ld returned 1 exit status
make[3]: *** [libmd.la] Error 1
Solution:
Compile fftw3 using the --with-pic switch:
./configure --enable-float --enable-mpi --enable-threads --with-pic 




12 March 2012

100. Compile strace on ROCK 5.4.3

Maybe I've set things up wrong, but I can't find any strace package in the yum repos on my ROCKS 5.4.3 installation.

The compilation is very easy, but I'll show it here for those who feel nervous about compiling their own programmes:

mkdir ~/tmp
cd ~/tmp

The wget takes a while to figure out where to download from -- be patient:
wget http://sourceforge.net/projects/strace/files/latest/download?source=files
unxz strace-4.6.tar.xz
tar -xvf strace-4.6.tar
cd strace-4.6/
./configure
make
sudo make install


How to use:
While I've spent a couple of years with Debian I'm a CentOS newbie, and I keep being confused about the location of the libs -- for my compiles I need to put libs in /usr/lib, but to execute I seem to need to put symlinks in /usr/lib64. strace can help you track where a program is looking for its libs

e.g. to see what the program sinfo is up to
 strace -o sinfo.log sinfo

Here is a snippet from sinfo.log:

open("/lib64/libc.so.6", O_RDONLY)      = 3
open("/usr/local/lib/sinfo/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/opt/openmpi/lib/librt.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib64/librt.so.1", O_RDONLY)     = 3
open("/usr/local/lib/sinfo/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/opt/openmpi/lib/libdl.so.2", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib64/libdl.so.2", O_RDONLY)     = 3

You can see that it e.g. looks for libdl.so.2 first in /usr/local/lib/sinfo, then in /opt/openmpi/lib/ and finally finds it in /lib64