LAM (Local Area Multicomputer) is an MPI programming environment and
development system for heterogeneous computers on a network.
With LAM, a dedicated cluster or an existing network computing
infrastructure can act as one parallel computer solving one problem.
LAM features a full implementation of Message-Passing Interface.
Compliant applications are source code portable between LAM and
any other implementation of MPI.
In addition to meeting the standard in a high quality manner,
LAM offers extensive monitoring capabilities to support debugging.
Monitoring happens on two levels.
LAM has the hooks to allow a snapshot of process and message status
to be taken at any time during an application run.
The status includes all aspects of synchronization plus datatype
map / signature, communicator group membership and message contents.
On the second level, the MPI library is instrumented to
produce a cummulative record of communication, which can be visualized either
at runtime or post-mortem.
LAM runs on each computer as a single UNIX daemon uniquely structured
as a nano-kernel and hand-threaded virtual processes.
The nano-kernel component provides a simple message-passing, rendez-vous
service to local processes.
Some of the in-daemon processes form a network communication subsystem,
which transfers messages to and from other LAM daemons on other nodes.
Inter-node packet exchange is via a scalable UDP-based protocol.
The network subsystem adds features like packetization and buffering
to the base synchronization.
Other in-daemon processes are servers for remote capabilities,
such as program execution and file access.
The layering is quite distinct: the nano-kernel has no connection with
the network subsystem, which has no connection with the servers.
Users can configure in or out services as necessary.
Application management and debugging are provided by a uniquely
structured extensible daemon.
The unique software engineering of LAM is transparent to
users and system administrators, who only see a conventional daemon.
System developers can de-cluster the daemon into a daemon containing
only the nano-kernel and several full client processes.
This developer's mode is still transparent to users but exposes LAM's
highly modular components to simplified individual debugging.
It also reveals LAM's evolution from Trollius, which ran natively
on scalable multicomputers and joined them to the UNIX network
through a uniform programming interface.
The network layer in LAM is a documented, primitive and abstract
layer on which to implement a more powerful communication standard
like MPI (PVM has also been implemented).
An important feature of LAM is hands-on control of the multicomputer.
There is very little that cannot be seen or changed at runtime.
In particular, unreceived messages and the synchronization state of
processes can be examined with the mpimsg and mpitask tools.
The key synchronization variables: source rank, destination rank, tag
and communicator, are all displayed.
Within a communicator, an opaque MPI object, the user can examine the
Within a datatype, the type map and signature are exposed.
Additionally, the contents of a message can be displayed.
Again, these capabilities are available at any moment during
an application's run.
Message queues and blocked processes can be examined at runtime.
The mpirun tool finds and loads the
program(s) which constitute the application.
A simple SPMD application can be specified on the mpirun command line
while a more complex configuration is described in a separate file,
called an application schema.
A strength of this MPI implementation is the movement of
non-blocking communication requests, including those that result
from buffered sends.
This is the real challenge of implementing MPI; everything else
is mostly a straight forward wrapping of an underlying communication
LAM / MPI allows messages to be buffered on the source end in any state
of progression, including partially transmitted packets.
This capability leads to great portability and robustness.
The sophisticated message-advancing engine at the heart of the
MPI library uses only a handful of routines to drive the underlying
Runtime flags decide which implementation of these low-level routines is used,
so recompilation is not necessary.
The default implementation uses LAM's network message-passing subsystem,
including its buffer daemon.
In this "daemon" mode, LAM's extensive monitoring features are fully available.
The main purpose of daemon-based communication is development, but
depending on the application's decomposition granularity and communication
requirement, it may also be entirely adequate for production execution.
MPI communication can bypass the LAM daemon for peak performance.
The other implementation of the MPI library's low-level communication
intends to use the highest performance underlying mechanism, certainly
bypassing the LAM daemon and connecting directly between application processes.
This is the "client to client" mode (C2C).
The availability of optimal C2C implementations will continue to change
as architectures come and go.
At the least, LAM includes a TCP/IP implementation of C2C
that bypasses the LAM daemon.
Guaranteed Envelope Resources
Applications may fail, legitimately, on some implementations but not
others due to an escape hatch in the MPI Standard called "resource
Most resources are managed locally and it is easy for
an implementation to provide a helpful error code and/or message when
a resource is exhausted.
Buffer space for message envelopes is often
a remote resource (as in LAM) which is difficult to manage.
An overflow may not be reported (as in some other implementations)
to the process that caused the overflow.
Moreover, interpretation of the MPI guarantee on message progress may confuse
the debugging of an application that actually died on envelope overflow.
LAM advertises and guarantees a minimum level of internal resources
for message delivery.
LAM 6.1 has a property called "Guaranteed Envelope Resources" (GER)
which serves two purposes.
It is a promise from the implementation to
the application that a minimum amount of envelope buffering will be
available to each process pair.
Secondly, it ensures that the producer
of messages that overflows this resource will be throttled or cited with
an error as necessary.
A minimum GER is configured when LAM is built.
A group of MPI processes can collectively create a group of new processes.
An intercommunicator is established for communication.
Dynamic Nodes and Fault Tolerance
An initial set of LAM nodes are started with the lamboot tool.
Afterwards, nodes can be added or deleted with the lamgrow
and lamshrink tools.
Thus, a resource manager can adjust the resources allocated to a
LAM session at runtime.
Both LAM nodes and MPI processes can be dynamically added and
removed at runtime in a dynamic resource environment.
The unplanned removal of a node due to a machine crash or other fault
is detected by LAM and handled in much the same way as a planned
adjustment with lamshrink.
All surviving application processes are informed, via asynchronous
signal, that a node has been removed.
The MPI library reacts by invalidating all communicators that include
processes on the dead node.
Pending communication requests are marked with an error and future
attempts to use invalid communicators also raise an error.
The application can detect these errors, free the invalid communicators
and possibly create new processes.
Installation and Session Initiation
LAM installs anywhere and uses the shell's search path at all times
to find LAM and application executables.
A multicomputer is specified as a simple list of machine names in a file,
which LAM uses to verify access (recon), start the environment
(lamboot), and remove it (wipe).
LAM / MPI Parallel Computing
Laboratory for Scientific Computing
University of Notre Dame
The LAM Team