Beowulf System Administration

Introduction

This guide is intended to cover basic system administration tasks specific to the cluster. Normal Linux manuals should be consulted for general configuration.

Powering up the Cluster

The system in which the server and nodes are located contains a mains power distribution board that supplies power both server nodes and ethernet switch unit. The ethernet switch unit does not have its own power switch and is powered up whenever the main supply is applied. The server and nodes are each individually switched by momentary action push switches located on the front panel of each unit.

The system should be powered up in the following sequence

Shutting down the Cluster

The BEAM supplied bmon program provides a simple menu driven method of shutting down the cluster. For a user sitting at the host node console the bmon program is started automatically. Further details regarding the bmon program can be found in it manual entry. Once the host node has shut down the main power switch can be operated.

General System Administration

Most of the system administration can be performed in the normal Linux manner, either using linuxconf or by editing the appropriate configuration file by hand. See the Linux information for more details. There are some changed due to the Beowulf configuration, the differences are listed here.

Adding a User

To add a user or group to the system, login as root and use the linuxconf application in the normal manner. The Linux conf application can be accessed from the control panel (Woman with wand icon) or from the command line. As the system uses the NIS system for password and group information it is necessary, after adding users or groups to reconfigure the NIS system. This is done by running the command: "/usr/lib/yp/ypinit -m" . All added users will be able to use the Beowulf environment.

Network Configuration

The beowulf cluster has its own network configuration which has been setup by BEAM. The host server will need further configuration to attach it to the sites network. The second ethernet card in the host server is used for site wide communication. The following will need to be configured:
 
Item Configuration
Second Ethernet Card The IP address and netmask should be set to the values given by the system administrator
Name Server The site name server should be entered in the /var/named/named.ca file
File system mounts Appropriate file system mounts can be added to the /etc/fstab file

System Backup

It is essential to back up the system from time to time. As the slave nodes can be easily re-installed from the host node it is only necessary to back up the host server node. It is recommended to perform a full backup about once per month and an incremental backup once per week depending on usage and data stored on the system.
To perform a full backup perform the following as root:
  1. Insert a Tape in the Tape drive
  2. cd /backup
  3. ./backall -f
To perform an incremental backup perform the following as root:
  1. Insert a Tape in the Tape drive
  2. cd /backup
  3. ./backall -i

Adding New Nodes

It is easy to add extra nodes to the system. To add a node perform the following:
  1. Connect new node to the system
  2. Boot the node from the supplied Node boot/Install floppy and perform the installation as given in the section Installing/Re-installing the software onto a new node
  3. During the Boot the new node will try and obtain network information from the host server. It will be necessary to add the ethernet card address as shown on the nodes screen during boot to the file /etc/bootptab on the server. Add a new line in this file with an appropriate node ID.
If more nodes that the BEAM maximum configuration are to be added then the following server files will also need modification:
 
File Use
/var/named/beowulf Name server information file
/var/named/beowulf.rev Name server information file
/etc/hosts.equiv Host name equivalent file

Installing/Re-installing the software onto a new node

Ensure that the host node has had all appropriate software installed and is running.
Ensure that the host and new node are connected via the ethernet switch.
Boot the node using the supplied Node boot/Install floppy.
Login to the node as root, the password is beam00
cd /usr/beowulf/install
./InstallNode
If the node has not had the software installed before then the disk partition table must be setup.
If the software has been installed before you may simply type  q <CR>

Add Three partitions to the systems

hda1    1G Byte                     linux native         (To be mounted as / active partition)
hda2    1G Byte                     linux swap          (To be mounted as /swap)
hda3    Remainder of disk     linux native        (To be mounted as /usr)

The keystrokes to perform these actions :-

        n <CR>        (New partition)
        p <CR>        (Primary)
        1 <CR>        (Partition number)
        1 <CR>        (First Cylinder)
        +1000M        (Last cylinder or size)
 

        n <CR>        (New partition)
        p <CR>        (Primary)
        2 <CR>        (Partition number)
        129 <CR>      (First Cylinder)
        +1000M<CR>    (Last cylinder or size)

        n <CR>        (New partition)
        p <CR>        (Primary)
        1 <CR>        (Partition number)
        257 <CR>      (First Cylinder)
        +1000M<CR>    (Last cylinder or size)
 

        Change partition 2 to be of type linux swap

        t <CR>        (type)
        2 <CR>        (partition number)
        82 <CR>       (partition type code)

        Make partition 1 the boot partition

       a <CR>         (Toggle boot flag)
       1 <CR>         (Partition number)

       w <CR>         (Write information and continue)

The software will now be installed. Once complete reboot by pressing the reset button on the node.
Note: The node install system make use of a single NFS mounted root directory on the server. This means that only one node can be installed at a time.

Disaster Recovery

We recommend that where possible master copies of any developed software reside on the server and a strict backup regime enforced. When data does not reside upon any of the nodes then the Install/Re-install node software procedure explained above can be used for any failing node. In the case of the server a backup tape of the system is supplied.

System Configuration

The cluster occupies its own private network IP address of 192.168.10. x ,where x is the node number, the host node being 1. The nodes are named as follows :-
 
Node Type Network IP Address Node Name
Server 192.168.10.1 n0
processing 192.168.10.2 n1
processing 192.168.10.3 n2
processing 192.168.10.4 n3
processing 192.168.10.5 n4
...

 ethernet switch                                        192.168.10.250                                       switch

In addition the host node has a separate network interface which is configure as appropriate for the site.

Host Server configuration

DNS name service

The File /var/named/named.ca should be edited with the name and IP address of the sites master name server plus any additional servers.

BOOTP service

If nodes are added or changed in the cluster then it will be necessary to add or change the entries in the /etc/bootptab file. This file relates a given ethernet card address with the appropriate network information for the nodes.

User Configuration

As the system uses the NIS system for password and group information it is necessary, after adding users or groups to reconfigure the NIS system. This is done by running the command: "/usr/lib/yp/ypinit -m" .

System Configuration Files

File Usage
/usr/beowulf/nodes List of nodes in the cluster. Should contain n0 -> n?
/usr/beowulf/bin/beowulf_shell User environment variable setup for beowulf system
/etc/rc.d/init.d/beowulf Beowulf Daemon startup
/var/named/named.ca Site name server configuration
/etc/bootptab Bootp network information file for each node

Updating System Software

Most of the system software has been installed using the RedHat RPM system. This makes it easy to update packages. As well as the standard RedHat Linux system BEAM provides a few packages for the Beowulf software. These include:
 
Package Use
kernel-2.2.5-beam1.i386.rpm Linux kernel configure for beowulf use
kernel-headers-2.2.5-beam1.i386.rpm Linux kernel configure for beowulf use
kernel-source-2.2.5-beam1.i386.rpm Linux kernel configure for beowulf use
beambeowulf-1.0-1.i386.rpm Beowulf programming environment including PVM, MPI and BEAM tools
ddd-doc-3.1.4-2.i386.rpm GUI Symbolic debugger
ddd-semistatic-3.1.4-2.i386.rpm GUI Symbolic debugger
lesstif-0.87.0-1.i386.rpm Motif GUI system
lesstif-devel-0.87.0-1.i386.rpm Motif GUI development system
nedit-5.02-1.i386.rpm GUI text editor

The beowulf software has been installed in /usr/beowulf to separate it from the system.

Installing a new Linux Kernel

There are a number of Linux HOWTO's on this subject, please refer to these for more information. The basic sequence of events are as follows:
  1. Copy the kernel sources to the /usr/src directory in an  appropriatly named kernel directory.
  2. Create a symbolic link from /usr/src/linux to this directory
  3. cd to this directory
  4. Configure the kernel with "make xconfig"
  5. Build the kernel with "make bzImage"
  6. Build the kernel modules with "make modules"
  7. Install the modules with "make install_modules"
  8. Install the kernel "cp arch/i386/boot/bzImage /boot/vmlinuz-<VERSION>"
  9. Create symbolic link from /boot/vmlinuz to /boot/vmlinuz-<VERSION>
  10. run lilo to set up the boot
Note two types of kernels are used, one for the host server and one for the slave nodes. The slave node kernel has been configured to get its network information from a BOOTP server. This kernel has been named:  /boot/vmlinuz-<VERSION>.nboot.
Once the server is up an running with the new kernel the nodes can be updated by re-installing the software on each.