![]() |
Jump to DART Documentation Main Index |
The latest release of the DART system includes an MPI option.
MPI stands for 'Message Passing Interface', and
is both a library and run-time system that enables multiple copies of
a single program to run in parallel, exchange data, and combine to solve
a problem more quickly.
The latest release of DART does *NOT* require MPI to run; the default build
scripts do not need nor use MPI in any way. However, for larger models
with large state vectors and large numbers of observations, the data
assimilation step will run much faster in parallel, which requires MPI
to be installed and used. However, if multiple ensembles of your model fit
comfortably (in time and memory space) on a single processor, you need
read no further about MPI.
MPI is an open-source standard; there are many implementations of it.
If you have a large single-vendor system it probably comes with an MPI
library by default. For a Linux cluster there are generally more variations
in what might be installed; most systems use a version of MPI called MPICH.
In smaller clusters or dual-processor workstations a version of MPI
called either LAM-MPI or OpenMPI might be installed, or can be downloaded
and installed by the end user. (Note that OpenMP is a different parallel
system; OpenMPI is a recent effort with a confusingly similar name.)
An "MPI program" makes calls to an MPI library, and needs to be compiled
with MPI include files and libraries. Generally the MPI installation
includes a shell script called 'mpif90' which adds the flags and libraries
appropriate for each type of fortran compiler.
So compiling an MPI program usually means
simply changing the fortran compiler name to the MPI script name.
These MPI scripts are built during the MPI install process and are specific to
a particular compiler; if your system has multiple fortran compilers installed
then either there will be multiple MPI scripts built, one for each compiler
type, or there will be an environment variable or flag to the MPI script
to select which compiler to invoke. See your system documentation or find
an example of a successful MPI program compile command and copy it.
To run in parallel, only the DART 'filter' program (and possibly the companion
'wakeup_filter' program) need be compiled with the MPI scripts.
All other DART executables should be compiled with a standard F90 compiler
and are not MPI enabled. (And note again that 'filter' can still be built as
a single executable like previous releases of DART; using MPI and running
in parallel is simply an additional option.) To build a parallel version
of the 'filter' program, the 'mkmf_filter'
command needs to be called with the '-mpi' option to generate a Makefile
which compiles with the MPI scripts instead of the Fortran compiler.
See the quickbuild.csh script
in each $DART/models/*/work directory for the commands
that need to be edited to enable the MPI utilities. You will also need
to edit the $DART/mkmf/mkmf.template file to call the proper version of
the MPI compile script if it does not have the default name, is not
in a standard location on the system, or needs additional options set
to select between multiple Fortran compilers.
MPI programs generally need to be started with a shell script called
'mpirun' or 'mpiexec', but they also interact with any batch control
system that might be installed on the cluster or parallel system.
Parallel systems with multiple users generally run some sort of batch
system (e.g. LSF, PBS, POE, LoadLeveler, etc). You submit a job request
to this system and it schedules which nodes are assigned to which jobs.
Unfortunately the details of this vary widely from system to system;
consult your local web pages or knowledgeable system admin for help here.
Generally the run scripts supplied with DART have generic sections to
deal with LSF, PBS, no batch system at all, and sequential execution, but
the details (e.g. the specific queue names, accounting charge codes)
will almost certainly have to be adjusted.
The data assimilation process involves running multiple copies (ensembles)
of a user model, with an assimilation computation interspersed between
calls to the model. There are many possible execution combinations,
including:
The choice of how to combine the 'filter' program and the model
has 2 parts: building the executables and then running them.
At build time, the choice of using MPI or not must be made.
At execution time, the setting of the 'async' namelist value
in the filter_nml section controls how the 'filter' program
interacts with the model.
Choices include:
This release of DART has the
restriction that if the model and the 'filter' program are both compiled
with MPI and are run in 'async=4' mode, that they both run on the same
number of processors; e.g. if 'filter' is run on 16 processors,
the model must be started on 16 processors as well.
Alternatively, if the user model is compiled as a single executable
(async=2),
'filter' can run in parallel on any number of processors and each
model advance can be executed independently without
the model having to know about MPI or parallelism.
Compiling and running an MPI application can be substantially
more complicated than running a single executable. There are a suite
of small test programs to help diagnose any problems encountered in
trying to run the new version of DART. Look in
developer_tests/mpi_utilities/tests/README for
instructions and a set of tests to narrow down any difficulties.
Getting good performance from a parallel program is frequently difficult. Here are a few of reasons why:
Parallel performance can be measured and expressed in several different ways. A few of the relevant definitions are:
We measured the strong scaling efficiency of the DART 'filter' program on a variety of platforms and problem sizes. The scaling looks very good up to the numbers of processors available to us to test on. It is assumed that for MPP (Massively-Parallel Processing) machines with 10,000s of processors that some algorithmic changes will be required. These are described in this paper.
Many parallel machines today are a hybrid of shared and
distributed memory processors; meaning that some small number
(e.g. 2-32) of CPUs share some amount of physical memory
and can transfer data quickly between them, while communicating
data to other CPUs involves slower communication across either
some kind of hardware switch or fabric, or a network communication
card like high speed ethernet.
Running as many tasks per node as CPUs per shared-memory node
is in general
good, unless the total amount of virtual memory used by the
program exceeds the physical memory. Factors to consider here
include whether each task is limited by the operating system
to 1/Nth of the physical memory, or whether one task is free
to consume more than its share. If the node starts paging
memory to disk, performance takes a huge nosedive.
Some models have large memory footprints, and it may be necessary
to run in MPI mode not necessarily because the computation is faster
in parallel, but because the dataset size is larger than the
physical memory on a node and must be divided and spread across
multiple nodes to avoid paging to disk.
DART software - Copyright UCAR. This open source software is provided by UCAR, "as is", without charge, subject to all terms of use at http://www.image.ucar.edu/DAReS/DART/DART_download
Contact: | Nancy Collins |
Revision: | $Revision$ |
Source: | $URL$ |
Change Date: | $Date$ |
Change history: | try "svn log" or "svn diff" |