DART-CAM setup

DART project logo

Jump to DART Documentation Main Index
version information for this file:
$Id: cam_guidelines.html 6597 2013-11-11 15:55:33Z nancy $

CALLING TREE / SET-UP / INITIAL ENSEMBLE / FILE CONTENTS / OUTPUT DIRECTORY / HINTS / SPACE TERMS OF USE

DART-CAM Setup Overview

This document gives specific help in setting up a DART-CAM experiment for the first time. Unless you just came from there, also see the model_mod documentation on the code-level interfaces and namelist values for DART-CAM.

Usually a run of DART-CAM involves executing multiple sequential batch job steps, each step consisting of a day's worth of assimilation. Between job steps the output from the previous job step must be saved or moved to be used as input for the next job step. Files may need to be archived or postprocessed. And finally, DART is a parallel MPI program and CAM can be compiled as a serial job, an OpenMP parallel job, or an MPI parallel job. On some systems an MPI parallel job cannot start another MPI parallel job, so running both a parallel DART and parallel CAM as a single job can be hard.

The DART distribution comes with some scripts which will probably have to be customizes for other users. The shell_scripts directory contains simple example scripts. The full_experiment directory contains scripts which do a full-up experiment with multiple days and automatic archiving.

For parallelism options it is fastest to run a serial (single-threaded) or OpenMP CAM with a parallel DART. This is selected by compiling CAM with the proper options, and then setting the async namelist variable in the &filter_nml namelist to 2. If there is not enough memory to run CAM as a serial job CAM can be compiled as a parallel program using MPI and then the async value is set to 4 instead.

For the short forecasts (6 hours) typically needed for assimilation, and on machines with at least as many processors as ensemble members, async=4 is usually much less efficient than async=2. This is because the start-up phase of an MPI CAM advance is single threaded and takes much longer than the short integration forward in time. async=4 (parallel CAM) leaves most of the processors idling for most of the time. async=2 lets all the start-up phases run at the same time on all of the processors. The drawback of async=2 is that there may not be enough memory on a single processor to handle a whole CAM.

async=2 and 4 allow users to control the assimilation through a single script, 'job_mpi.csh'. job_mpi.csh has a section of user set parameters which define many aspects of the assimilation. It uses those parameters to create a series of batch scripts, one for each obs_seq.out file that will be assimilated. Their names have the form [Experiment]_[obs_seq#].script, the parts of which are defined in job_mpi.csh. It submits all of those scripts to the batch scheduler, where they run in succession as resources become available. Each obs_seq batch script executes the programs layed out in the calling tree (below).

There are, no doubt, things missing from these lists, so don't struggle too long before contacting raeder'at'ucar.edu.

The sequence of operations for assimilating a single obs_seq.out file follows. The functionality of each operation has been restricted to one "domain". A script/program is specific to:

Go to cam/model_mod page


Calling Tree

The calling tree for the scripts and fortran executables when running under async=2 or 4 is:

SCRIPT DOMAIN LOCATION OR NOTES
Exper_obsseq#.script experiment experiment central directory where I/O and execution are organized.
   -> mpirun filter executable 
filter version local disc on a compute node/processor, or a work directory in the central directory.
      -> advance_model.csh
model pre-existing work subdirectory of the central directory.
         -> dart_to_cam executable
model Translates DART time format into CAM time format and inserts DART state vector into CAM initial file
         -> run-cam.csh
model Modified form of run script from CAM, now in DART
            -> bld-namelist 
model uses namelistin and results of dart_to_cam to make CAM namelist
            -> advance of CAM (version dependent)
model Single threaded or MPI CAM
         -> cam_to_dart executable
Translates CAM fields in to DART state vector
   -> qsub auto_re2hpss.csh
machine Central/Experiment/obs_####. Archives restart data sets.
   -> qsub auto_diag2hpss_LSF.csh
machine Central/Experiment/obs_####. Archives diagnostic output.
      -> analyses2initial.csh
machine Central/Experiment/obs_####/H## Generates analyses in CAM,CLM,CICE initial files. May also save CAM .h0. history files (but make them small !)
         -> NetCDF operators 
machine Central/Experiment/obs_####/H## Averaging of CAM,CLM,CICE fields.
        (-> clm_ens_avg.f90
machine Central/Experiment/obs_####/H## Complicated snow averaging. Not usable (yet) for CAM 4.0.1 and later.)

Experiment Set-Up

Instructions for setting up a DART/CAM in "stand-alone" mode assimilation. This section does not describe setting up assimilations within the CESM framework (available in late 2011?).


  1. DART
    1. Register and check out DART from the download site.
    2. In .../DART/mkmf link mkmf.template to the mkmf.template.xxxx which is appropriate for your computer (or make one of your own).
    3. cd to .../DART/models/cam/work and edit input.nml:preprocess_nml:input_files to choose obs_def source code files to load via the preprocessor. The default file obs_def/obs_def_reanalysis_bufr_mod.f90 will handle observations from NCEP reanalysis BUFR files. (Warning: assimilating the obs in the example GPS obs_seq.out file available from the DART Obs_sets site requires loading more than just the obs_def_gps_mod.f90.)
    4. Script DART/models/cam/work/quick_build.csh is recommended for compiling the package. It is set up to compile and run the preprocessor, compile filter and wakeup_filter as MPI processes (async=2 and 4), and compile all other executables (but not CAM) as single process programs. If you want a single-process filter, quick_build.csh can take -nompi as an argument, which will build filter and wakeup_filter that way. The main script job_mpi.csh may need minor modifications to run that way. Keep in mind that all of the ensemble members' state vectors will have to fit in the memory of a single processor, so this is only suitable for low resolution, small ensemble testing.
  2. CAM
    1. Put the DART modifications (.../DART/models/cam/Cam[version#]_DART_mods.tar), and any other CAM modifications you have, in the CAM directory of user-provided modifications. CAM's "configure" will use these during compilation, if you tell it to. Configure and compile CAM at the resolution/dynamical core desired. Do this in a directory where all your CAM versions will reside, here called CamCentral.
    2. Link the cam executable and config_cache*.xml files into a subdirectory of the CAM source code directory [CamCentral/CAM_version]/models/atm/cam/bld , which I'll call CAM_config_1 (add -mpi for MPI CAM or -omp for OpenMP CAM). Job_mpi.csh has a variable "CAMsrc" that should point to that location. (This location is necessary because run-cam.csh looks for files and programs in the parent directory of the directory which has the CAM executable in it.)
    3. Build a CAM namelist and call it 'namelistin' containing (among the other/default variables defined by the CAM build-namelist):
                   CAM 3.1 - 3.5
                      &camexp
                       ncdata         = 'caminput.nc'
                       caseid         = 'whatever_you_want'
                       nsrest         = 0
                       calendar       = 'GREGORIAN'
                       inithist       = 'ENDOFRUN'
                      /
                      &clmexp
                       finidat        = 'clminput.nc'
                      /
                   CAM 3.6 - 4.?
                      &seq_timemgr_inparm
                       calendar       = 'GREGORIAN'
                       restart_n      = 12
                       restart_option = 'nsteps'
                       stop_option    = 'date'
                      /
                      &cam_inparm
                       ncdata   = 'caminput.nc'
                       inithist = 'ENDOFRUN'
                       div24del2flag = 4,
                       nhtfrq = -6
                      /
                      &clm_inparm
                       finidat = 'clminput.nc'
                      /
                      &dom_inparm
                       bndtvs  = '/your/SST/file/here/resolution/cam_version/sst_HadOIBl_bc_1.9x2.5_1949_2007.nc'
                       sstcyc  = .false.
                      /
                      
      and NOT containing ...
                      >  start_ymd      = 20020901
                      >  start_tod      = 0
                      >  stop_ymd       = 20021201
                      >  stop_tod       = 0
                   
      If you want to make the .h0. history files small and focused, for archiving: add to &cam_inparm
                         empty_htapes    = .true.,
                         mfilt           =  1
                         avgflag_pertape = 'A'
                         fincl1   = 'A', 'Few', 'Useful', 'CAM' ,'Fields'
                   
      In the past (as of 5/15/2011), the released versions of CAM cannot use the GREGORIAN calendar. This is true for the released versions of CAM4 (4.0.1 from CCSM4) and CAM5 (from CESM1.0) Fixes for many versions can be found in the Cam[version]_DART_mods.tar files included in the DART package. Those may be portable into the version of CAM you want to use. In order to use the GREGORIAN calendar, CAM must be built with the full ESMF time manager. It must be built single-threaded if you will use async=2, or MPI for async=4. In addition, check that CAM is actually linking to the time manager you want (the CAM build scripts have their own ideas, and may not ask permission to do what they want). As a last resort, run experiments which don't cover leap years and use the default calendar = 'NO_LEAP', or make obs_sequence.out files which don't have 2/29 on them. The run-cam.csh script will call the CAM build-namelist script, which will use namelistin to make (a) new namelist(s) with the correct forecast parameters, named 'namelist' (< CAM 3.5) or '{atm,drv,ice,lnd,ocn}_in' (>= CAM 3.5). Put it/them into a directory where job_mpi.csh will be able to find it; something like CamCentral/CAM_version/models/atm/cam/bld/CAM_config_1.
    4. Confirm that you have access to all the input files whose names are generated by build-namelist in 'namelist' or '{atm,drv,ice,lnd,ocn}_in', or suitable replacements.
    5. Put an appropriate CAM initial file in CAM_config_1 and call it caminput.nc. Put a matching CLM initial file there and call it clminput.nc. (Matching is important; CAM checks that the CLM initial file has the expected grid parameters.) Only the grid information is used from these files, so their dates don't matter.
  3. Set up an experiment central directory ("Central" here) where there's enough space for output.
  4. Copy the DART namelist file (DART/models/cam/work/input.nml) into one called "input_1.nml" in Central.
  5. Copy DART/models/cam/shell_scripts/job_mpi.csh to Central
  6. EXPERIMENT
    1. If you need to make up synthetic observations get create_obs_sequence, create_fixed_network_seq and perfect_model_obs and learn how to use them. Otherwise, use the obs_seq.out files provided here or similar 'real observations' files.
    2. Edit job_mpi.csh. It has more detailed instructions about how to:
      • define experiment output directory names
      • tell it whether CAM is MPI or OpenMP
      • provide the directory name of CAM executable and associated files (.../CAM_config_1 in this page)
      • define which obs_seq.out files to use. Some pre-made examples can be found here.
      • find and provide the path name of the obs_seq.out files
      • find and provide the path name of the filter_ic[.#] files. Such an ensemble can be created from CAM initial files using cam_to_dart.
      • define which CAM and CLM initial files to use. Some initial and filter_ic files are available from the NCAR Mass Store:/RAEDER/DAI/CAM_init/[Resol]_[model_version] and more from DART large file site
      • define resources to be requested by each of the Experiment_[obs_seq_#].script scripts
    3. Edit input_1.nml to configure the assimilation of the first obs_seq.out. Be sure that
      • filenames listed in it agree with what's required by job_mpi.csh and what is or will be available in the Central directory.
      • start from restart is .true. if you have an initial ensemble, .false. otherwise.
      • init_time_days is the first Gregorian day of your assimilation. This can be obtained from the program ...DART/time_manager/advance_time.f90
      • init_time_seconds is the first second of the first day of your assimilation (usually 0).
    4. Copy input_1.nml to input_n.nml and edit input_n.nml. Set start_from_restart to .true. Change init_time_days = -1, init_time_seconds = -1.
    5. Run the experiment by executing job_mpi.csh, either through the batch queue or interactively.

CAM Initial Ensembles

Strategies for generating an initial ensemble from which DART can start.
All of these strategies require converting CAM initial file(s) into filter_ic.#### files, which is done by the same method.

  1. MINIMAL WORK; Get an ensemble of filter and CAM/CLM[/CICE] initial files from someone else (DART has a few dates for a few model cores and resolutions here. This limits the investigations you can undertake, but is the fastest and cheapest way to start assimilating.
  2. MINIMAL CAM COMPUTING; an assimilation can be started from a single CAM (+CLM[+CICE]) initial file. The single model state is randomly perturbed to make as many ensemble members as are requested in the ens_size variable in the filter_nml namelist. Create a filter_ic file from the CAM initial file (dart_to_cam.f90). Create an obs_seq.out file which has a single observation with a large observational error variance, valid at least a week after the start date for the spin-up. This will make the ensemble advance long enough to balance the fields, without being perturbed by the assimilation of any observations.
    &filter_nml
       ...
       start_from_restart       = .false.,
       restart_in_file_name     = "filter_ic",
       ...
    /
    &model_nml
       ...
       pert_names         = 'T       ','US      ','VS      '
       pert_sd           = 1.0d0,2.0d0,2.0d0
       ...
    /
    
    Note that start_from_restart is false ("don't start from a pre-existing *ensemble*"), but a restart file (filter_ic) is still needed for filter to have something realistic to perturb. pert_names specifies which fields will be perturbed. CAM field names are used. pert_sd > 0 allows each point of the pert_names fields of each ensemble member to be randomly perturbed with a standard deviation of pert_sd. Other fields can be used, but moisture variables are tricky because of their variation with height by orders of magnitude. Regardless of which fields are specified, the spin-up period will allow the fields to come into balance with respect to the model, so the perturbations will propagate into all fields.
  3. FULL FUNCTION ENSEMBLE; In order to have, on hand, initial ensembles of any practical size, for any date of the year, we recommend the following. Scripts for doing this are available in .../DART/models/cam/make_ensemble. See the README there for more details. They are not highly documented or elegent, but offer a starting point. Make 20 successive 1-year free CAM runs (MPI CAM highly recommended, NO_LEAP calender), saving the initial files every 5 days. Then pull together all of the, e.g., Jan 6ths (00Z) into a 20 member ensemble (numbered 1...20). Don't forget the CLM initial files, and possibly the CICE restart file(s): After CAM 3.6.57 there is only one, NetCDF, CICE initial/restart file (yea!), which is called iceinput_#.nc within DART-CAM. Repeat for each date at which there are initial files and archive them. When you need an ensemble of, say 60 members for June 1 then retrieve the 20 members from each of May 26, May 31, and June 5, renumbering them 1,...,60. Convert each of the CAM initial files into a filter_ic.#### file (cam_to_dart.f90).

File Contents

See the Lanai release notes for diagrams and flowcharts showing DART under the various async options. The contents of some of the files which appear there are listed here. # refers to the list of ensemble members.

FILE CONTENTS or PURPOSE
assim_model_state_ic# the state vectors to be used as initial conditions for the next model advance. Contains the state vector time, the target time, and the state vector.
assim_model_state_ud# the updated state vectors returned by the model advance. Contain the state vector time (was the target time) and the state vector for one ensemble member.
filter_ic_old#s the initial conditions to be used by the filter for the next assimilation of a single obs_seq.out file. There may be one of these, or one for each ensemble member, named filter_ic_old.####, where the #### means a 4 digit number such as 0001.
filter_ic_new#s same as filter_ic_new#s, except that it/they are produced at the end of the assimilation, for use by the next assimilation.
input.nml the filter namelist file, containing the namelists for all the necessary modules of the filter.
caminput.nc model initial file provides information about the model to filter, such as state vector size, etc.
caminput_#.nc CAM has more fields in its initial files than we use in the DART state vector. It's useful to carry these fields along from advance to advance so that they don't need to spin-up as much at the beginning of each advance. dart_to_cam replaces the state vector fields in these "shells" with the contents of assim_model_state_ic and leaves the other fields alone.
namelists the forecast model may need namelist(s) to define its advance.
obs_seq.final the innovations in observation space which result from the assimilation of all the chosen obs in obs_seq.out.
obs_seq.out the set of observations to be assimilated. How the observations are distributed in time defines when the model advances happen.
Posterior_Diag.nc the state vector in model space after each assimilation defined by the obs times in obs_seq.out.
Prior_Diag.nc the state vector in model space before each assimilation defined by the obs times in obs_seq.out. It results from the previous model advance.
True_State.nc the state vector in model space resulting from an execution of perfect_model_obs. These are the model forecast values from which identity obs are derived.

Output Directory

Organization of output directories:

DIRECTORY CREATOR CONTENTS and PURPOSE
Central directory  
User Location of scripts and pass-through point for files during execution. Typically named according defining characteristics of a *set* of experiments; resolution, model, obs being assimilated, unique model state variables, etc.
    Experiment   
job_mpi.csh Location of subdirectories of output and some diagnostics files. Typically where the obs-space diagnostics are calculated using obs_diag.
       obs_#  
Exper_obsseq#.script Each holds the obs-space and model-space output from assimilating one obs_seq.out file. It should be named according to the need for obs_diag to see a name with 'obs_' and a 4 digit number signifying it's place within the series of obs_seq.out files, i.e. obs_0002 for the second obs_seq.final of a series.
           DART 
Exper_obsseq#.script Holds the filter restart files (named filter_ic[.#]) created at the end of the filter run for this obs_seq.out. They're used by the next obs_seq.out file to restart.
           CAM
Exper_obsseq#.script Holds the CAM initial file shells which carry along model fields which are not DART state vector fields (preventing the repeated re-spin-up of those variables)
           CLM  
Exper_obsseq#.script Same as CAM, but for Community Land Model initial files.
           CICE  
Exper_obsseq#.script Same as CAM, but for CICE model restart files, tarred by ensemble member.
           H##  
advance_model.csh Temporary storage for ensembles of CAM, CLM, CICE files to be averaged for analyses archiving by auto_diag2hpss_LSF.csh.

A typical pathname for a restart file in my case would be:
/scratch/raeder/T21x80/Taper1/obs_0003/DART/filter_ic#
                |      |      |        |    restart file(s)
                |      |      |        DART restart file directory
	        |      |      Obs_seq (3rd obs_seq.out of a series starting at 1)
                |      Experiment (reduced influence of obs above 150 hPa)
                Central directory (resolution x num_ens_members)

You may also want to make a subdirectory within Experiment for each set of obs_space postscript and .dat files created by obs_diag and matlab.


Helpful Hints

In the following, MPI filter uses all of the requested processors. There is flexibility in how the ensemble of CAMs uses them. The choice of async and number of processors to use will depend on the memory available on each node, as well as the number of processors available. See also models/cam/doc/html/filter_async_modes.html in the DART code tree, after registering.

For async = 2 use the ensemble size, available compute nodes, processors/node , and memory/node to figure how many nodes to request. Make this request in job_mpi.csh. For example, on a machine with 8 processors/node, and running an assimilation with an ensemble of 80 members (recommended), it's efficient to request 5 (or 10) nodes. This will advance the single-process CAMs in 2 (or 1) batches of 40 (80). That's assuming that each node has the memory to accomodate 8 CAMs.

async = 4 runs an ensemble of pure-MPI CAMs 1 at a time and is usually a poor choice for CAM assimilations because the start-up process for CAM is single process and takes significant time. So all but one of the processors wait a long time while CAM is setting itself up, then they all work for a short time to make the short forecast, and then repeat for the next ensemble member.

async = 3 is part way between async = 2 and 4. It is part of an experimental version of the code which is in the general release yet. CAM must be compiled with pure OpenMP parallelism. Then the MPI filter can execute multiple CAMs simultaneously on several processors each. The start-up for each is still a single process, but a smaller fraction of the processors wait. This mode will make reasonably efficient use of hundreds of processors.

Each batch of restart data can be saved to a mass store using (a modified) auto_re2hpss and retrieved using models/cam/full_experiment/hpss2restart. Execute the commands with no arguments to see instructions. Then package files of each ensemble member together, and then bundle batches of ensemble members together for efficient storage in a directory named similarly to the one where they exist on the main computer.

Modify and use alias 'rmtemp' to remove the temporary files from the central directory where the experiment is run, after a run bombs and before running another experiment.

alias rmtemp 'rm *_ud* *_ic[0-9]* cam_*_temp* c[al]minput_[1-9]*.nc  \
              *control filter_ic_old* obs_seq.out times'

Needless to say, be careful that you don't name files you want to keep in such a way that they'll be deleted by this.


Space Requirements

Space requirements (Mb per ensemble member) for several CAM resolutions.



Resolution filter_ic CAM initial CLM initial Diagnostic (assuming no ensemble members and 1 day/output file)
T5 .16 .3 .15 1.3 + obs_seq.final
T21 2.5 4.5 1.4 21. + obs_seq.final
T42 10. 18. 4.5 57. + "
T85 41. 74. 15. 342 + "
FV1.9x2.5 CAM4 17. 45. 10. 286. + "
FV1.9x2.5 CAM5 20. 125. 58. 286. + "
FV0.9x1.25 CAM5 trop_mam3 80. 500. 205. ??? + "

obs_seq.final typically ranges from 50-150 Mb, independent of model resolution. Compression can meaningfully reduce the size of the NetCDF and obs_seq.final files for archiving.

Useful terms found in this web page.

TERM MEANING
{Central} The directory where filter runs and from which job_mpi.csh is submitted, if used. There should be enough disk space for all of the output.
{Experiment} The directory where the output from the assimilation of all obs_seq.out files will be put.
{obs_#} The sequence number (as defined in job_mpi.csh parameters) used to make a directory name, which will store the output of the assimilation of 1 obs_seq.out file, e.g. obs_####

Terms of Use

DART software - Copyright 2004 - 2013 UCAR.
This open source software is provided by UCAR, "as is",
without charge, subject to all terms of use at
http://www.image.ucar.edu/DAReS/DART/DART_download

Contact: Kevin Raeder
Revision: $Revision: 6597 $
Source: $URL: https://svn-dares-dart.cgd.ucar.edu/DART/releases/Lanai/models/cam/doc/cam_guidelines.html $
Change Date: $Date: 2013-11-11 08:55:33 -0700 (Mon, 11 Nov 2013) $
Change history: try "svn log" or "svn diff"