Greetings.  This README file contains information about the programs
in this directory, and troubleshooting help if you are trying to get
the MPI options in DART to compile and run.


INTRODUCTION:

<TODO: insert something generic about MPI here, about how DART uses it, about
how to select it if you want it (once we decide how that works), how to avoid
it if you don't, etc.>

<TODO: this directory also needs to be converted to use mkmf like the rest of
the project, but i had a heck of a time getting these programs compiled on
the various platforms.  so i decided to try to keep all the various options 
i needed to set collected together in a single makefile to make it easier to
see the common flags and differences between the systems.  when it's more
stable and i'm more comforatable with what needs to be set, i'll integrate 
it into the mkmf.template files.>


HOW TO VERIFY AN INSTALLATION:

This directory contains a small set of test programs which use the MPI
(Message Passing Interface) communications library.  They can be tricky to
get compiled and running, especially on Linux clusters which seem to have an
infinite number of permutations of batch queue systems, compilers, and mpi
libraries.

Examples of batch queue systems:  PBS, LSF, LoadLeveler
Examples of compiler vendors: Intel (ifort), PGI (pgf90), Absoft (f95)
Examples of MPI libraries:  mpich, LAM, OpenMPI


<TODO: the following paragraph will change when i convert to mkmf>

The Makefile has settings for many of the local NCAR Linux clusters.  If your
machine is not on the list of automatically recognized machines you can
either edit the Makefile by hand and set DART_TARGET to the correct value at
the top, or you can set DART_TARGET as an environment variable 
(setenv DART_TARGET xxx  or  export DART_TARGET=xxx  depending on your shell.)

Type:  make
to compile the programs.

Type:  make check
to run the programs interactively and look for errors.

Type:  make batch
to submit the test programs to the batch queue.  This will almost certainly
not work without modifications to the Makefile unless you are using one of
the NCAR machines which has a predefined Makefile section.


WHAT YOU GET:

ftest_mpi.f90 is a Fortran 90 program which calls a few basic MPI library
functions.  If it compiles and runs interactively, you have mounted one of
the 2 large hurdles in running with MPI.  If you can submit this executable
to the batch queue and have it run, you are done.  Go have a beer.

driver.f90 and commtest_mod.f90 are compiled into the program 'commtest'.
This program mimics the communication patterns of a real data assimilation
executable and can be used to time and test various options.  It requires MPI
to run, and with problem-sized arrays it probably should be submitted to a
batch queue if you are running on any kind of shared cluster.

'make check' will try to build and run both of these programs interactively.
'make batch' will submit them to the batch system to execute them.  If you
have problems, keep reading below for more help in diagnosing exactly where
things are going wrong.

<TODO: currently there are separate scripts for each executable and each
batch system.  this was mostly for my sanity in trying to get these things
working.  now that they seem to be ok, i will try to consolidate the batch
directives into single executable file with comments. >


TROUBLESHOOTING:

If the ftest_mpi.f90 program does not compile, here are a few things to
check.  You must be able to compile and run this simple program before
anything else is going to work.

1. Include file vs module

Some MPI installations supply a header file (a .h or .inc file) which
define the parameters for the MPI library.  Others supply a Fortran 90
module which contains the parameters and subroutine prototypes.  Use one
or the other.  The code contains a commented out 'use' statement and
comes by default expecting to use the include file.

2. Compiler wrappers

Most MPI installations include compiler "wrapper" programs which you call
instead of the actual compiler.  They add any needed compiler flags and they
add the MPI libraries to the link lines.  But they are usually built for one
particular compiler, so if your system has multiple Fortran compilers
available you will need to find the right set of MPI wrappers.  Generally it
is called 'mpif90' for the Fortran 90 compiler.  Try to go this route if at
all possible.  This might mean adding a new directory to your shell search
path, or loading a new module with the 'module' command.

3. Batch system

Most clusters have some form of batch control.  You login to one node on the
cluster, but to execute a compute job you must run a command which adds the
job to a list of waiting jobs.  Especially for MPI jobs which expect to use
multiple processors at the same time, a batch control system ensures that
each job is started on the right number of processors and does not conflict
with other running jobs.

The batch control system knows how many nodes are available for jobs, whether
some queues have higher or lower priority, the maximum time a job can run,
the maximum number of processors a job can request, and it schedules the use
of the nodes based on the jobs in the execution queues.  The two most common
batch systems on Linux clusters are PBS and LSF.  They are complicated, but
don't despair.  This directory comes with scripts which work on many of the
local NCAR systems, but if they do not work for your system the simplest way
to proceed is to find a colleague who has a working script and copy it. 
Another way is to ask your system support people for advice.  For the
independent-minded, google for examples out on the web.


OTHER THINGS:

A few other programs are included in this directory to help diagnose
non-working setups.  To compile and run everything:  make everything
It will echo messages as things pass or fail.

ftest_f90.f90 is a simple, Fortran 90 program without MPI.
It confirms you have a working F90 compiler.  Try: make ftest_f90 
to compile only this program.

ftest_nml.f90 is a Fortran 90 program without MPI which opens and reads in a
namelist (ftest_nml.nml).  Some batch systems seem to leave your program with
a different idea of what the current directory is than what the 'pwd' command
says.  This program can help confirm whether you have to hardcode an entire
pathname into your fortran open() calls or not.  Try: make ftest_nml

ftest_f77.f is a Fortran 77 program without MPI.  It can confirm that you
have some kind of working F77 compiler on your system.  Try: make ftest_f77

ctest.c is a C language program.  The Fortran executables have no dependency
on C, but if there is any question about whether there is a working C
compiler on the system, try:  make ctest.

ctest_mpi.c is a C language program which uses the C versions of the MPI
library calls.  Again, the Fortran executables have no dependency on C, but
it is possible that the MPI libraries were compiled without the Fortran
interfaces.  If this routine compiles and runs but the Fortrans ones do
not, that might be a useful clue.  Try: make ctest_mpi (to compile), then:
make run_c (to execute).


Any questions, email me at:  nancy@ucar.edu

<TODO: is there a dart mail id for general questions?>

Good luck -
nancy collins

