CALLING TREE / FLOW CHART / FILE CONTENTS / OUTPUT DIRECTORY / SET-UP / HINTS / SPACE

DART-CAM OVERVIEW

Contact: Kevin Raeder
Revision: $Revision: 2871 $
Source: $URL: http://subversion.ucar.edu/DAReS/DART/trunk/models/cam/doc/index.html $
Change Date: $Date: 2007-04-12 16:35:48 -0600 (Thu, 12 Apr 2007) $
Change history: try "svn log" or "svn diff"

The up-to-date overview will always be available at http://www.image.ucar.edu/DAReS/DART/cgd_cam.shtml

For the Jamaica release the async=3 option is no longer offered, and has been replaced by async=4. This new option runs an MPI filter, which can use either single threaded or MPI CAM. The single threaded option will run CAM for 1 ensemble member on each processor (up to the lesser of the number of ensemble members or the number of processors). The MPI CAM option will run CAM for each ensemble member in succession, using all the available processors. It's not possible (yet) to run several MPI CAMs at the same time, each using a subset of the processors.

This new option allows users to control the assimilation through a single script, except for modifications for machines on which DART-CAM hasn't been tested yet. Job_mpi.csh has a section of user set parameters that define many aspects of the assimilation. It uses those parameters to create a series of batch scripts, one for each obs_seq.out file that will be assimilated. It submits all of those script to the batch scheduler, where they run in succession as resources become available. Each obs_seq batch script executes the programs layed out in the calling tree (below).

The async=2 option (non-MPI filter and non-MPI CAM) is still available.

These options have been tested for DART-CAM in batch submission environments PBS and LSF on Linux clusters and IBM AIX.

There are, no doubt, things missing from these lists, so don't struggle too long before contacting raeder'at'ncar.ucar.edu.

The sequence of operations for doing an assimilation of a single obs_seq.out file follows. The functionality of each has been restricted to one "domain". A script/program is specific to only one of the following: a machine where the experiment is run; a model version used in the assimilation; the filter version; or the experiment being conducted using the choices for the previous 3.

Go to cam/model_mod page


CALLING TREE

The calling tree for these scripts (and fortran executables) is:

SCRIPT DOMAIN LOCATION
Resol_case_obsseq#.lsf experiment experiment central directory where I/O and execution is organized.
   -> mpirun filter executable 
filter version local disc on a compute node/processor, or a work directory in the central directory.
      -> advance_model.csh
Single threaded or MPI CAM pre-existing work subdirectory of the central directory.
         -> trans_time executable
         -> trans_sv_pv executable
         -> run_pc.csh
model Modified form of run script from CAM, now in DART
            -> bld-namelist 
model uses namelistin and results of trans_time to make CAM namelist
            -> advance of CAM
         -> trans_pv_sv executable
   -> qsub auto_re2ms.csh
machine Central/Experiment/obs_seq_output_directory(i.e. 01_01)

FILE CONTENTS

See the Jamaica release notes for diagrams and flowcharts showing DART under the various async options. The contents of some of the files which appear there are listed here.
FILE CONTENTS or PURPOSE
assim_model_state_ic# the state vectors to be used as initial conditions for the next model advance. Contains the state vector time, the target time, and the state vector.
assim_model_state_ud# the updated state vectors returned by the model advance. Contain the state vector time (was the target time) and the state vector for one ensemble member.
filter_ic_old#s the initial conditions to be used by the filter for the next assimilation of a single obs_seq.out file. There may be one of these, or one for each ensemble member, named filter_ic_old.####, where the #### means a 4 digit number such as 0001.
filter_ic_new#s same as filter_ic_new#s, except that it/they are produced at the end of the assimilation, for use by the next assimilation.
input.nml the filter namelist file, containing the namelists for all the necessary modules of the filter.
model initial file such as caminput.nc, provides information about the model which the filter needs, such as state vector size, etc.
namelists the forecast model may need namelist(s) to define its advance.
obs_seq.final the innovations in observation space which result from the assimilation of all the chosen obs in obs_seq.out.
obs_seq.out the set of observations to be assimilated. How the observations are distributed in time defines when the model advances happen.
Posterior_Diag.nc the state vector in model space after each assimilation defined by the obs times in obs_seq.out.
Prior_Diag.nc the state vector in model space before each assimilation defined by the obs times in obs_seq.out. It results from the previous model advance.
state shells CAM has more fields in its initial files (caminput_#.nc) than we use in the DART state vector. It's useful to carry these fields along from advance to advance so that they don't need to spin-up as much at the beginning of each advance. trans_sv_pv replaces the state vector fields in these "shells" with the contents of assim_model_state_ic and leaves the other fields alone.
True_State.nc the state vector in model space resulting from an execution of perfect_model_obs. These are the model forecast values from which identity obs are derived.


OUTPUT DIRECTORY

Organization of output directories created up by job_mpi.csh
DIRECTORY CONTENTS and PURPOSE
Central directory  
(location of scripts and pass-through point for files during execution) (typically named according defining characteristics of a *set* of experiments; resolution, model, obs being assimilated, unique model state variables, etc.)
    Experiment   
(location of subdirectories of output and some diagnostics files. Typically where the obs-space diagnostics are calculated; obs_diag)
       Obs_seq subdirectory(s) 
Each holds the obs-space and model-space output from assimilating one obs_seq.out file. It should be named according to the need for obs_diag to see a name with the 2 digit month, underscore, and the number within the series of obs_seq.out files, i.e. 01_02 for the second obs_seq.final of a January case. The script job_mpi.csh will make these directories if you use it.
           DART 
holds the filter restart files (named filter_ic[.#]) created at the end of the filter run for this obs_seq.out. They're used by the next obs_seq.out file to restart.
           CAM
holds the CAM initial file shells which carry along model fields which are not DART state vector fields (preventing the repeated re-spin-up of those variables)
           CLM  
Same as CAM, but for Community Land Model initial files.

A typical pathname for a restart file in my case would be:
/scratch/cluster/raeder/T21x80/Taper1/01_03/DART/filter_ic
                        |      |      |     DART restart file directory
                        |      |      Obs_seq (Jan 3)
                        |      Experiment (reduced influence of obs above 150 hPa)
                        Central directory (resolution x num_ens_members)


You may also want to make a subdirectory within Experiment for each set of obs_space postscript and .dat files created by obs_diag and matlab.


EXPERIMENT SET-UP

Instructions for setting up a DART-CAM assimilation using these scripts.

  1. DART
    1. Since you have this file you've already checked out DART.
      Edit ...DART/models/cam/work/input.nml:preprocess_nml:input_files to choose obs_def source code files to load via the preprocessor. The default file (obs_def_reanlysis_bufr_mod.f90) will handle observations from NCEP reanalysis BUFR files.
    2. In .../DART/mkmf link mkmf.template to the mkmf.template.xxxx which is appropriate for your computer (or make one of your own).
    3. Script DART/models/cam/work/workshop_setup.csh is recommended for compiling the package. It is set up to compile filter and wakeup_filter as MPI processes (async=4), and all other executables as single threaded. If you want to use async=2, remove the -mpi flag from the mkmf_filter and mkmf_wakeup_filter commands.
  2. CAM
    1. Put the DART modifications (.../DART/models/cam/Cam[version#]_DART_mods/*), and any other CAM modifications you have, in the CAM directory of user-provided modifications. CAM's "configure" will use these during compilation. Configure and compile CAM at the resolution/dynamical core desired. Do this in a directory where all your CAM versions will reside, here called CamCentral.
    2. Put the cam executable and config_cache.xml in a subdirectory of the standard cam executable directory (CamCentral/CAM_version/models/atm/cam/bld), which I'll call CAM_config_1-mpi (leave off the -mpi for single threaded CAM). Job_mpi.csh has a variable "CAMsrc" that should point to that location.
    3. Build a CAM namelist and call it 'namelistin' containing (among the other/default variables defined by the CAM build-namelist):
                   &camexp
                    ncdata         = 'caminput.nc'
                    caseid         = 'whatever_you_want'
                    nsrest         = 0
                    calendar       = 'GREGORIAN'
                    inithist       = 'ENDOFRUN'
                   /
                   &clmexp
                    finidat        = 'clminput.nc'
                   /
                   
      and NOT containing ...
                   >  nhtfrq         = 4368
                   >  start_ymd      = 20020901
                   >  start_tod      = 0
                   >  stop_ymd       = 20021201
                   >  stop_tod       = 0
                   
      The CAM build-namelist script will use this to make a new namelist with the correct forecast parameters, named 'namelist'. Put this in CamCentral/CAM_version/models/atm/cam/bld/CAM_config_1.
    4. Confirm that you have access to all the CAM input files listed in namelistin, or suitable replacements.
    5. Put a CAM initial file in CAM_config_1 and call it caminput.nc. Put a CLM initial file there and call it clminput.nc. Only the grid information is used from these files, so their dates don't matter.
  3. Set up an experiment central directory ("Central" here) where there's enough space for output. (See "Space" below)
  4. Copy the DART namelist file (DART/models/cam/work/input.nml) into one called "input_1.nml" in Central.
  5. Copy DART/models/cam/shell_scripts/job_mpi.csh to Central
  6. EXPERIMENT
    1. Edit job_mpi.csh. It has more detailed instructions about how to:
      • define experiment output directory names
      • tell it whether CAM is MPI
      • provide directory name of CAM executable
      • define which obs_seq.out files to use Some pre-made examples can be found here.
      • find and link to the obs_seq.out files
      • find and link to filter_ic[.#] files
      • define which CAM and CLM initial files to use. Some initial and filter_ic files are available from DART large file site
      • define which DART version to use
      • define resources to be requested by each of the obs_seq_#.lsf scripts
    2. Edit input_1.nml to configure the assimilation of the first obs_seq.out. Be sure that
      • filenames listed in it agree with what's required by job_mpi.csh and what is or will be available in the Central directory.
      • start from restart is .true. if you have an initial ensemble, .false. otherwise.
      • init_time_days is the first Gregorian day of your assimilation.
      • init_time_seconds is the first second of your assimilation (usually 0).
    3. Copy it to input_n.nml and edit input_n.nml. Change start_from_restart to .true. Change init_time_days = -1, init_time_seconds = -1.
    4. If you need to make up synthetic observations get create_obs_sequence, create_fixed_network_seq and perfect_model_obs. Otherwise, use the obs_seq.out files provided here


HELPFUL HINTS

For async=2 use the ensemble size, available compute nodes, and processors/node to figure how many nodes to request. Make this request in job_mpi.csh. For example, on a machine with 2 processors/node, and running an assimilation with a typical ensemble of 20 members, it's efficient to request 5 nodes. This will advance CAM in 2 batches of 10 (1 CAM/processor).

Each batch of restart data can be saved to a mass store using (a modified) auto_re2ms and retrieved using .../ms2restart. Execute the commands with no arguments to see instructions. They package files of each ensemble member together, and then bundle batches of ensemble members together for efficient storage in a directory named similarly to the one where they exist on the cluster.

If you're not running job_mpi.csh as a batch job, run it as 'nohup ./job_mpi.csh >& /dev/null &', to protect the job from being cutoff by the closure of the window in which it was executed.

Modify and use alias 'rmtemp' to remove the temporary files from the central directory where the experiment is run, before running another experiment.

alias rmtemp 'rm *_ud* *_ic[1-9]* cam_*_temp* c[al]minput_[1-9]*.nc filter_assim_region_* \
              *control filter_ic_old* obs_seq.out times'
Needless to say, be careful that you don't name files you want to keep in such a way that they'll be deleted by this.


SPACE REQUIREMENTS

Space requirements (per ensemble member) for several CAM resolutions.

Resolution filter_ic CAM initial CLM initial Diagnostic (assuming no ensemble members)
T5 .16 Mb.3 Mb .15 Mb 1.3 Mb + obs_seq.final
T21 2.5 Mb4.5 Mb 1.4 Mb 21. Mb + obs_seq.final
T42 10. Mb18. Mb 4.5 Mb 57. Mb + "
T85 41. Mb74. Mb 15. Mb 342 Mb + "
obs_seq.final typically ranges from 50-150 Mb, independent of model resolution