Contact: | Kevin Raeder |
Revision: | $Revision: 2871 $ |
Source: | $URL: http://subversion.ucar.edu/DAReS/DART/trunk/models/cam/doc/index.html $ |
Change Date: | $Date: 2007-04-12 16:35:48 -0600 (Thu, 12 Apr 2007) $ |
Change history: | try "svn log" or "svn diff" |
The up-to-date overview will always be available at http://www.image.ucar.edu/DAReS/DART/cgd_cam.shtml
For the Jamaica release the async=3 option is no longer offered, and has been replaced by async=4. This new option runs an MPI filter, which can use either single threaded or MPI CAM. The single threaded option will run CAM for 1 ensemble member on each processor (up to the lesser of the number of ensemble members or the number of processors). The MPI CAM option will run CAM for each ensemble member in succession, using all the available processors. It's not possible (yet) to run several MPI CAMs at the same time, each using a subset of the processors.
This new option allows users to control the assimilation through a single script, except for modifications for machines on which DART-CAM hasn't been tested yet. Job_mpi.csh has a section of user set parameters that define many aspects of the assimilation. It uses those parameters to create a series of batch scripts, one for each obs_seq.out file that will be assimilated. It submits all of those script to the batch scheduler, where they run in succession as resources become available. Each obs_seq batch script executes the programs layed out in the calling tree (below).
The async=2 option (non-MPI filter and non-MPI CAM) is still available.
These options have been tested for DART-CAM in batch submission environments PBS and LSF on Linux clusters and IBM AIX.
There are, no doubt, things missing from these lists, so don't struggle too long
before contacting raeder'at'ncar.ucar.edu.
The sequence of operations for doing an assimilation of a single obs_seq.out file follows.
The functionality of each has been restricted to one "domain".
A script/program is specific to only one of the following:
a machine where the experiment is run;
a model version used in the assimilation;
the filter version;
or the experiment being conducted using the choices for the previous 3.
Go to cam/model_mod page
The calling tree for these scripts (and fortran executables) is:
SCRIPT | DOMAIN | LOCATION |
---|---|---|
Resol_case_obsseq#.lsf | experiment | experiment central directory where I/O and execution is organized. |
-> mpirun filter executable |
filter version | local disc on a compute node/processor, or a work directory in the central directory. |
-> advance_model.csh |
Single threaded or MPI CAM | pre-existing work subdirectory of the central directory. |
-> trans_time executable |
||
-> trans_sv_pv executable |
||
-> run_pc.csh |
model | Modified form of run script from CAM, now in DART |
-> bld-namelist |
model | uses namelistin and results of trans_time to make CAM namelist |
-> advance of CAM |
||
-> trans_pv_sv executable |
||
-> qsub auto_re2ms.csh |
machine | Central/Experiment/obs_seq_output_directory(i.e. 01_01) |
See the Jamaica release notes for diagrams and flowcharts showing DART under the various async options. The contents of some of the files which appear there are listed here.
FILE | CONTENTS or PURPOSE |
---|---|
assim_model_state_ic# | the state vectors to be used as initial conditions for the next model advance. Contains the state vector time, the target time, and the state vector. |
assim_model_state_ud# | the updated state vectors returned by the model advance. Contain the state vector time (was the target time) and the state vector for one ensemble member. |
filter_ic_old#s | the initial conditions to be used by the filter for the next assimilation of a single obs_seq.out file. There may be one of these, or one for each ensemble member, named filter_ic_old.####, where the #### means a 4 digit number such as 0001. |
filter_ic_new#s | same as filter_ic_new#s, except that it/they are produced at the end of the assimilation, for use by the next assimilation. |
input.nml | the filter namelist file, containing the namelists for all the necessary modules of the filter. |
model initial file | such as caminput.nc, provides information about the model which the filter needs, such as state vector size, etc. |
namelists | the forecast model may need namelist(s) to define its advance. |
obs_seq.final | the innovations in observation space which result from the assimilation of all the chosen obs in obs_seq.out. |
obs_seq.out | the set of observations to be assimilated. How the observations are distributed in time defines when the model advances happen. |
Posterior_Diag.nc | the state vector in model space after each assimilation defined by the obs times in obs_seq.out. |
Prior_Diag.nc | the state vector in model space before each assimilation defined by the obs times in obs_seq.out. It results from the previous model advance. |
state shells | CAM has more fields in its initial files (caminput_#.nc) than we use in the DART state vector. It's useful to carry these fields along from advance to advance so that they don't need to spin-up as much at the beginning of each advance. trans_sv_pv replaces the state vector fields in these "shells" with the contents of assim_model_state_ic and leaves the other fields alone. |
True_State.nc | the state vector in model space resulting from an execution of perfect_model_obs. These are the model forecast values from which identity obs are derived. |
Organization of output directories created up by job_mpi.csh
DIRECTORY | CONTENTS and PURPOSE |
---|---|
Central directory |
(location of scripts and pass-through point for files during execution) (typically named according defining characteristics of a *set* of experiments; resolution, model, obs being assimilated, unique model state variables, etc.) |
Experiment |
(location of subdirectories of output and some diagnostics files. Typically where the obs-space diagnostics are calculated; obs_diag) |
Obs_seq subdirectory(s) |
Each holds the obs-space and model-space output from assimilating one obs_seq.out file. It should be named according to the need for obs_diag to see a name with the 2 digit month, underscore, and the number within the series of obs_seq.out files, i.e. 01_02 for the second obs_seq.final of a January case. The script job_mpi.csh will make these directories if you use it. |
DART |
holds the filter restart files (named filter_ic[.#]) created at the end of the filter run for this obs_seq.out. They're used by the next obs_seq.out file to restart. |
CAM |
holds the CAM initial file shells which carry along model fields which are not DART state vector fields (preventing the repeated re-spin-up of those variables) |
CLM |
Same as CAM, but for Community Land Model initial files. |
A typical pathname for a restart file in my case would be: /scratch/cluster/raeder/T21x80/Taper1/01_03/DART/filter_ic | | | DART restart file directory | | Obs_seq (Jan 3) | Experiment (reduced influence of obs above 150 hPa) Central directory (resolution x num_ens_members)
Instructions for setting up a DART-CAM assimilation using these scripts.
&camexp ncdata = 'caminput.nc' caseid = 'whatever_you_want' nsrest = 0 calendar = 'GREGORIAN' inithist = 'ENDOFRUN' / &clmexp finidat = 'clminput.nc' /and NOT containing ...
> nhtfrq = 4368 > start_ymd = 20020901 > start_tod = 0 > stop_ymd = 20021201 > stop_tod = 0The CAM build-namelist script will use this to make a new namelist with the correct forecast parameters, named 'namelist'. Put this in CamCentral/CAM_version/models/atm/cam/bld/CAM_config_1.
For async=2 use the ensemble size, available compute nodes, and processors/node to figure how many
nodes to request. Make this request in job_mpi.csh. For example, on a machine with
2 processors/node, and running an assimilation with a
typical ensemble of 20 members, it's efficient to request 5 nodes.
This will advance CAM in 2 batches of 10 (1 CAM/processor).
Each batch of restart data can be saved to a mass store using (a modified)
auto_re2ms and retrieved using .../ms2restart. Execute the
commands with no arguments to see instructions. They package files of each ensemble
member together, and then bundle batches of ensemble members together for efficient
storage in a directory named similarly to the one where they exist on the cluster.
If you're not running job_mpi.csh as a batch job,
run it as 'nohup ./job_mpi.csh >& /dev/null &', to protect the job from being cutoff
by the closure of the window in which it was executed.
Modify and use alias 'rmtemp' to remove the temporary files from
the central directory where the experiment is run, before running another experiment.
alias rmtemp 'rm *_ud* *_ic[1-9]* cam_*_temp* c[al]minput_[1-9]*.nc filter_assim_region_* \ *control filter_ic_old* obs_seq.out times'Needless to say, be careful that you don't name files you want to keep in such a way that they'll be deleted by this.
Space requirements (per ensemble member) for several CAM resolutions.
Resolution | filter_ic | CAM initial | CLM initial | Diagnostic (assuming no ensemble members) |
---|---|---|---|---|
T5 | .16 Mb | .3 Mb | .15 Mb | 1.3 Mb + obs_seq.final |
T21 | 2.5 Mb | 4.5 Mb | 1.4 Mb | 21. Mb + obs_seq.final |
T42 | 10. Mb | 18. Mb | 4.5 Mb | 57. Mb + " |
T85 | 41. Mb | 74. Mb | 15. Mb | 342 Mb + " |