INTERFACE / PUBLIC COMPONENTS / NAMELIST / FILES / REFERENCES / ERRORS / BUGS / PLANS / PRIVATE COMPONENTS

MODULE mpi_utilities_mod

Contact: Nancy Collins
Reviewers:  
Revision: $Revision: 1.1 $
Release Name: $Name: $
Change Date: $Date: 2006/09/15 22:13:01 $
Change history: see CVS log

OVERVIEW

This module provides subroutines which access the MPI (message passing interface) parallel communications library. To compile without using MPI substitute the null_mpi_utilities_mod.f90 for this one.




OTHER MODULES USED

types_mod
utilities_mod
time_manager_mod
mpi_mod



PUBLIC INTERFACE

use mpi_utilities_mod, only : task_count
 my_task_id
 transpose_array
 initialize_mpi_utilities
 finalize_mpi_utilities
 task_sync
 task_count
 my_task_id
 transpose_array
 array_broadcast
 array_distribute
 send_to
 receive_from
 iam_task0
 broadcast_send
 broadcast_recv
 shell_execute
 sleep_seconds
 sum_across_tasks
 make_pipe
 destroy_pipe
 exit_all

NOTES

No namelist interfaces are currently defined for this module, but at some point in the future an optional namelist interface &mpi_utilities_nml may be supported. It would be read from file input.nml.




PUBLIC COMPONENTS


call initialize_mpi_utilities( )

Description

Initializes the MPI library, creates a private communicator, stores the total number of tasks and the local task number for later use, and registers this module. On some implementations of MPI (in particular some variants of MPICH) it is best to initialize MPI before any I/O is done from any of the parallel tasks, so this routine should be called as close to the process startup as possible.

It is not an error to try to initialize the MPI library more than once. It is still necessary to call this routine even if the application itself has already initialized the MPI library. Thise routine creates a private communicator so internal communications are shielded from any other communication called outside the DART libraries.

It is an error to call any of the other routines in this file before calling this routine.



call finalize_mpi_utilities([callfinalize])
logical, intent(in), optional         :: callfinalize

Description

Frees the local communicator, and shuts down the MPI library unless callfinalize is specified and is .FALSE.. On some hardware platforms it is problematic to try to call print or write from the parallel tasks after finalize has been executed, so this should only be called immediately before the process is ready to exit. If the application itself is using MPI, the callfinalize argument can be used to defer closing the MPI library until the application does it itself.

It is an error to call any of the other routines in this file after calling this routine.

callfinalize    If false, do not call the MPI_Finalize() routine.


var = task_count()
integer                        :: task_count

Description

Returns the total number of MPI tasks this job was started with. Note that MPI task numbers start at 0, but this is a count. So a 4-task job would return 4 here, but the actual task numbers will be from 0 to 3.

task_count     Total number of MPI tasks in this job.


var = my_task_id()
integer                        :: my_task_id

Description

Returns the MPI task number. This is one of the routines in which all tasks can make the same function call but each returns a different value. The return can be useful in creating unique filenames or otherwise distinguishing resources which are not shared amongst tasks. MPI task numbers start at 0, so valid task id numbers for a 4-task job would be 0 to 3.

my_task_id     My unique MPI task id number.


call task_sync()

Description

Synchronize tasks. This call does not return until all tasks have called this routine. This ensures all tasks have reached the same place in the code before proceeding.



call send_to(dest_id, srcarray [, time])
integer, intent(in)                   :: dest_id
real(r8), dimension(:), intent(in)    :: srcarray
type(time_type), intent(in), optional :: time

Description

Use the MPI library to send a copy of an array of data from one task to another task. The sending task makes this call; the receiving task must make a corresponding call to receive_from().

If time is specified, it is also sent to the receiving task. The receiving call must match this sending call regarding this argument; if time is specified here it must also be specified in the receive; if not given here it cannot be given in the receive.

The current implementation uses MPI_Ssend() which does a synchronous send. That means this routine will not return until the receiving task has called the receive routine to accept the data. This may be subject to change; MPI has several other non-blocking options for send and receive.

dest_id     The MPI task id of the receiver.
srcarray     The data to be copied to the receiver.
time     If specified, send the time as well.

Notes

The send and receive subroutines must be used with care. These calls must be used in pairs; the sending task and the receiving task must make corresponding calls or the tasks will hang. Calling them with different array sizes will result in either a run-time error or a core dump. The optional time argument must either be given in both calls or in neither or one of the tasks will hang. (Sense a trend here?)


call receive_from(src_id, destarray [, time])
integer, intent(in)                    :: src_id
real(r8), dimension(:), intent(out)    :: destarray
type(time_type), intent(out), optional :: time

Description

Use the MPI library to receive a copy of an array of data from another task. The receiving task makes this call; the sending task must make a corresponding call to send_to(). Unpaired calls to these routines will result in the tasks hanging.

If time is specified, it is also received from the sending task. The sending call must match this receiving call regarding this argument; if time is specified here it must also be specified in the send; if not given here it cannot be given in the send.

The current implementation uses MPI_Recv() which does a synchronous receive. That means this routine will not return until the data has arrived in this task. This may be subject to change; MPI has several other non-blocking options for send and receive.

src_id     The MPI task id of the sender.
destarray     The location where the data from the sender is to be placed.
time     If specified, receive the time as well.

Notes

See the notes section of send_to().


call exit_all(exit_code)
integer, intent(in)                   :: exit_code

Description

A replacement for calling the Fortran intrinsic exit. This routine calls MPI_Abort() to kill all MPI tasks associated with this job. This ensures one task does not exit silently and leave the rest hanging. This is not the same as calling finalize_mpi_utilities() which waits for the other tasks to finish, flushes all messages, closes log files cleanly, etc. This call immediately and abruptly halts all tasks associated with this job.

Depending on the MPI implementation and job control system, the exit code may or may not be passed back to the calling job script.

exit_code     A numeric exit code.

Notes

It would generally be helpful to write out some kind of message before calling this routine, to indicate where in the code it is dying. This routine is called by the error handler in the utilities module after writing out the pending error message.


call transpose_array()

Description

Currently unimplemented; transposes are implemented with a series of calls to send_to() and receive_from().



call array_broadcast(array, root)
real(r8), dimension(:), intent(inout) :: array
integer, intent(in)                   :: root

Description

All tasks must make this call together, but the behavior in each task differs depending on whether it is the root or not. On the task which has an ID equal to root the contents of the array will be sent to all other tasks. On any task which has an ID not equal to root THE array is the location where the data is to be received into. Thus array is intent(in) on root, and intent(out) on all other tasks.

When this routine returns, all tasks will have the contents of the root array in their own arrays.

array     Array containing data to send to all other tasks, or the location in which to receive data.
root     Task ID which will be the data source. All others are destinations.

Notes

This is another of the routines which must be called by all tasks. The MPI call used here is synchronous, so all tasks block here until everyone has called this routine.


call array_distribute(srcarray, root, dstarray, dstcount, how, which)
real(r8), dimension(:), intent(in)    :: srcarray
integer, intent(in)                   :: root
real(r8), dimension(:), intent(out)   :: dstarray
integer, intent(out)                  :: dstcount
integer, intent(in)                   :: how
integer, dimension(:), intent(out)    :: which

Description

Currently unimplemented. Could be used to distribute proper subsets of an array across all tasks in a job.

srcarray     Entire data array to be used as a data source.
root     Task ID with source array.
dstarray     Destination array where subset of data is to be placed.
root     Count of how many items are in the dstarray.
how     Select different algorithms for doing the distribution.
which     Integer index array of which values were assigned to this task.


var = iam_task0()
logical                        :: iam_task0

Description

Returns .TRUE. if called from the task with MPI task id 0. Returns .FALSE. in all other tasks. It is frequently the case that some code should execute only on a single task. This allows one to easily write a block surrounded by if (iam_task0()) then ... .

iam_task0     Convenience function to easily test and execute code blocks on task 0 only.


call broadcast_send(from, array1, array2)
integer, intent(in)                   :: from
real(r8), dimension(:), intent(inout) :: array1
real(r8), dimension(:), intent(inout) :: array2

Description

Cover routine for array_broadcast(). This call must be matched with the companion call broadcast_recv(). This routine should only be called on the task which is the root of the broadcast; it will be the data source. All other tasks must call broadcast_recv(). This routine sends 2 data arrays because this is a common code pattern in the DART filter code. This routine ensures that from is the same as the current task ID.

In reality the data arrays here are intent(in) only but this routine will be calling array_broadcast() internally and so must be intent(inout) to match.

from     Current task ID; the root task for the data broadcast.
array1     First data array to be broadcast.
array2     Second data array to be broadcast.

Notes

This is another of the routines which must be called consistently; only one task makes this call and all other tasks call the companion broadcast_recv routine. The MPI call used here is synchronous, so all tasks block until everyone has called one of these two routines.


call broadcast_recv(from, array1, array2)
integer, intent(in)                   :: from
real(r8), dimension(:), intent(inout) :: array1
real(r8), dimension(:), intent(inout) :: array2

Description

Cover routine for array_broadcast(). This call must be matched with the companion call broadcast_send(). This routine must be called on all tasks which are not the root of the broadcast; the array arguments specify the location in which to receive data from the root. (The root task should call broadcast_send().) This routine receives 2 data arrays because this is a common code pattern in the DART filter code. This routine ensures that from is not the same as the current task ID.

In reality the data arrays here are intent(out) only but this routine will be calling array_broadcast() internally and so must be intent(inout) to match.

from     The task ID for the data broadcast source.
array1     First array location to receive data into.
array2     Second array location to receive data into.

Notes

This is another of the routines which must be called consistently; all tasks but one make this call and exactly one other task calls the companion broadcast_send routine. The MPI call used here is synchronous, so all tasks block until everyone has called one of these two routines.


call sum_across_tasks(addend, sum)
integer, intent(in)                   :: addend
integer, intent(out)                  :: sum

Description

All tasks call this routine, each with their own different addend. The returned value in sum is the total of the values summed across all tasks, and is the same for each task.

addend     Single input value per task to be summed up.
sum     The sum.

Notes

This is another of those calls which must be made from each task, and the calls block until this is so.




NAMELIST

We adhere to the F90 standard of starting a namelist with an ampersand '&' and terminating with a slash '/'.

This module currently has no namelist entries. One expected addition would be a list of task numbers in which the default option is to print out all informational messages; the current default is that messages are only printed from task 0 and all others suppressed. (Warnings and Errors print in all cases.)

Discussion

This namelist would be read from a file called input.nml




FILES

Depending on the implementation of MPI, the library routines are either defined in an include file (mpif.h) or in a proper Fortran 90 module (use mpi). If it is available the module is preferred; it allows for better argument checking and optional arguments support in the MPI library calls.




REFERENCES


ERROR CODES and CONDITIONS

If MPI returns an error, the DART error handler is called with the numeric error code it received from MPI. See any of the MPI references for an up-to-date list of error codes.




KNOWN BUGS




FUTURE PLANS




PRIVATE COMPONENTS