DART is designed to work with many models without
modifications to the DART routines or the model source code.
DART can 'wrap around' your model in two ways.
One can be used if your model can be called as a subroutine, the
other is for models that are separate executables. Either way,
there are some steps that are common to both paths.
Please be aware that several of the high-order models (CAM and WRF, in particular) have been used for years and their scripts have incorporated failsafe procedures and restart capabilities that have proven to be useful but make the scripts complex - more complex than need be for the initial attempts. Truly, some of the complexity is no longer required for available platforms. Then again, we're not running one instance of a highly complicated computer model, we're running N of them.
Generally, it is a good strategy to use DART to create a synthetic observation sequence with ONE observation location - and ONE observation type - for several assimilation periods. With that, it is possible to run perfect_model_obs and then filter without having to debug too much stuff at once. A separate document will address how to test your model with DART.
#1 Don't shoot the messenger. We have a lot of experience trying to write portable/reproducible code and offer these suggestions. All of these suggestions are for the standalone DART components. We are not asking you to rewrite your model. If your model is a separate executable, leaving it untouched is fine. Writing portable code for the DART components will allow us to include your model in the nightly builds and reduces the risk of us making changes that adversely affect the integration with your model. There are some routines that have to play with the core DART routines, these are the ones we are asking you to write using these few simple guidelines.
Hopefully, you have no idea how difficult it is to build each model with 'unique' compile options on N different platforms. Fortran90 provides a nice mechanism to specify the type of variable, please do not use vendor-specific extensions. (To globally autopromote 32bit reals to 64bit reals, for example. That is a horrible thing to do, since vendors are not consistent about what happens to explicitly-typed variables. Trust me. They lie. It also defeats the generic procedure interfaces that are designed to use a single interface as a front-end to multiple 'type-specific' routines.) Compilers do abide by the standard, however, so DART code looks like:
depending on the use. The r4 and r8 types are explicitly defined in DART/common/types_mod.f90 to accurately represent what we have come to expect from 32bit and 64bit floating point real variables, respectively. If you like, you can redefine r8 to be the same as r4 to shrink your memory requirement. The people who run with WRF frequently do this. Do not redefine the digits12 parameter, that one must provide 64bit precision, and is used in precious few places.
If your model is a separate executable, there is some flexibility to provide the required interfaces and it would be wise to look at the heavily commented template script DART/models/templates/shell_scripts/advance_model.csh and then a few higher-order models to see how they do it. Become familiar with DART/doc/html/mpi_intro.html (DART's use of MPI), DART/doc/html/filter_async_mod the filter namelist parameter async in filter.html.
A little explanation/motivation is warranted. If the model uses the standard layout, it is much easier to include the model in the nightly builds and testing. For this reason alone, please try to use the recommended directory layout. Simply looking at the DART/models directory should give you a pretty good idea of how things should be laid out. Copy the template directory and its contents. It is best to remove the (hidden) subversion files to keep the directory 'clean'. The point of copying this directory is to get a model_mod.f90 that works as-is and you can modify/debug the routines one at a time.
The destination directory (your model directory) should be in the
DART/models directory to keep life simple.
Moving them around will cause problems for the
work/mkmf_xxxxx configuration files.
Each model directory should have a work
and shell_scripts directories, and may
have a matlab directory,
a src directory,
or anything else you may find convenient.
Now, you must change all the work/path_names_xxx file contents to reflect the location of your model_mod.f90.
We have templates, examples, and a document describing the required interfaces
in the DART code tree -
Every(?) user-visible DART program/module has a matching piece of
documentation that is distributed along with the code.
The DART code tree always has the most current documentation.
Check out time_manager_mod.f90 and utilities_mod.f90 for general-purpose routines ...
Use Fortran namelists for I/O if possible.
Modify the model_mod.f90 file to return specifics about your model. This module MUST contain all the required interfaces (no surprise) but it can also contain many more interfaces as is convenient. This module should be written with the understanding that print statements and error terminations will be executed by multiple processors/tasks. To restrict print statements to be written once (by the master task), it is necessary to preface the print as in this example:
if (do_output()) write(*,*)'model_mod:namelist cal_NML',startDate_1,startDate_2
No matter the complexity of the model, the DART software requires
a few interface routines in a model-specific Fortran90 module
model_mod.f90 file. The
has extended comment blocks at the heads of each of these routines
that go into much more detail for what is to be provided.
You cannot change the types or number of required arguments
to any of the required interface routines. You can add
optional arguments, but you cannot go back throught the DART tree to
change the gazillion calls to the mandatory routines. It is absolutely
appropriate to look at existing models to get ideas about how to
implement the interfaces. Finding a model implementation that is
functionally close to yours always helps.
As of December 2008, the table of the mandatory interfaces and programming degree-of-difficulty is:
|subroutine callable||separate executable||routine||description|
|easy||easy||get_model_size||This function returns the size of all the model variables (prognostic or diagnosed or ...) that are packed into the 1D DART state vector. That is, it returns the length of the DART state vector as a single scalar integer.|
|depends||trivial||adv_1step||For subroutine-callable models, this routine is the one to actually advance the model 1 timestep (see models/bgrid_solo/model_mod.f90 for an example). For non-subroutine-callable models, this is a NULL interface. Easy.|
|depends||depends||get_state_meta_data||This routine takes as input an integer into the DART state vector and returns the associated location and (optionally) variable type from obs_kind/obs_kind_mod.f90. (See models/*/model_mod.f90 for examples.) This generally requires knowledge of how the model state vector is packed into the DART array, so it can be as complicated as the packing.|
|depends||hard||model_interpolate||This is one of the more difficult routines. Given a DART state vector, a location, and a desired generic 'kind' (like KIND_SURFACE_PRESSURE, KIND_TEMPERATURE, KIND_SPECIFIC_HUMIDITY, KIND_PRESSURE, ... ); return the desired scalar quantity and set the return status accordingly. This is what enables the model to use observation-specific 'forward operators' that are part of the common DART code.|
|easy||easy||get_model_time_step||This routine returns the smallest increment in time (in seconds) that the model is capable of advancing the state in a given implementation. For example, the dynamical timestep of a model is 20 minutes, but there are reasons you don't want to (or cannot) restart at this interval and would like to restart AT MOST every 6 hours. For this case, get_model_time_step should return 21600, ie 6*60*60. This is also interpreted as the nominal assimilation period. This interface is required for all applications.|
|easy||easy||end_model||Performs any shutdown and cleanup needed. Good form would dictate that you should deallocate any storage allocated when you instantiated the model (from static_init_model, for example).|
|depends||depends||static_init_model||Called to do one-time initialization of the model. This generally includes setting the grid information, calendar, etc.|
|trivial||trivial||init_time||Returns a time that is somehow appropriate for starting up a long integration of the model IFF the namelist parameter start_from_restart = .false. for the program perfect_model_obs. If this option is not to be used in perfect_model_obs, this can be a NULL interface.|
|easy||easy||init_conditions||Companion interface to init_time. Returns a model state vector that is somehow appropriate for starting up a long integration of the model. Only needed IFF the namelist parameter start_from_restart = .false. for the program perfect_model_obs.|
|trivial-difficult||trivial-difficult||nc_write_model_atts||This routine is used to write the model-specific attributes to the netCDF files containing the prior and posterior states of the assimilation. The subroutine in the models/template/model_mod.f90 WILL WORK for new models but does not know anything about prognostic variables or geometry or ... Still, it is enough to get started without doing anything. More meaningful coordinate variables etc. are needed to supplant the default template. This can be as complicated as you like - see existing models for examples.|
|trivial-difficult||trivial-difficult||nc_write_model_vars||This routine is responsible for writing the DART state vector -or- the prognostic model variables to the output netCDF files. If the namelist parameter output_state_vector == .false. this routine is responsible for partitioning the DART state vector into the appropriate netCDF pieces (i.e. the prognostic model variables). The default routine will simply blast out the entire DART state vector into a netCDF variable called 'state'.|
|depends||trivial-difficult||pert_model_state||This routine is used to generate initial ensembles. This may be a NULL interface if you can tolerate the default perturbation strategy of adding noise to every state element or if you generate your own ensembles outside the DART framework. There are other ways of generating ensembles ... climatological distributions, bred singular vectors, voodoo ...|
|trivial||trivial||get_close_maxdist_init||This routine performs the initialization for the table-lookup routines that accelerate the distance calculations. This routine is closely tied to the type of location module used by the model and is frequently (universally?) simply a 'pass-through' routine to a routine of the same name in the location module. There is generally no coding that needs to be done, but the interface must exist in model_mod.f90|
|trivial||trivial||get_close_obs_init||This routine performs the initialization for the get_close accelerator that depends on the particular observation. Again, this is generally a 'pass-through' routine to a routine of the same name in the location module.|
|trivial||trivial||get_close_obs||This is the routine that takes a single location and a list of other locations, returns the indices of all the locations close to the single one along with the number of these and the distances for the close ones. Again, this is generally a 'pass-through' routine to a routine of the same name in the location module.|
|easy||easy||ens_mean_for_model||This routine simply stores a copy of the ensemble mean of the state vector within the model_mod. The ensemble mean may be needed for some calculations (like converting model sigma levels to the units of the observation - pressure levels, for example).|
Since this is an optional step, it will be covered in a separate document.
The normal sequence of events is that DART reads in its own restart file
(do not worry about where this comes from right now) and eventually
determines it needs to advance the model. DART needs to be able to
take its internal representation of each model state vector, the
valid time of that state, and the amount of time to advance the state -
and communicate that to the model. When the model has advanced the state
to the requested time, the output must be ingested by DART and the cycle
begins again. DART is entirely responsible for reading the observations
and there are several programs for creating and manipulating the observation
There are a couple of ways to exploit parallel architectures with DART, and these have an immediate bearing on the design of the script(s) that control how the model instances (each model copy) are advanced. Perhaps the conceptually simplest method is when each model instance is advanced by a single processor element. DART calls this async = 2. It is generally efficient to relate the ensemble size to the number of processors being used.
The alternative is to advance every model instance one after another using all available processors for each instance of the model. DART calls this async = 4, and requires an additional script. For portability reasons, DART uses the same processor set for both the assimilation and the model advances. For example, if you advance the model with 96 processors, all 96 processors will be employed to assimilate.
advance_model.csh is invoked in one of two ways: 1) if async = 2 then filter uses a system() call, or 2) if async = 4 then run_filter.csh makes the call. Either way there are three arguments.
Modify shell_scripts/advance_model.csh to:
During this initial phase, it may be useful to _leave_ the temporary directory intact until you verify everything is as it should be.
In addition to modifying shell_scripts/advance_model.csh as described above, you must also modify shell_scripts/run_filter.csh in the following way: THIS PART NEEDS TO BE FILLED IN
After DART has assimilated the observations and created new (posterior)
states, it is necessary to reformat those posteriors into input for the
model. Fundamentally, you are unpacking the DART state vector and putting
the pieces back into whatever portion of your model initial conditions file
is appropriate. Frequently this is done by a program called
Put another way; Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. Frequently this is done by dart_to_model.f90. The DART array has a header that contains the 'advance-to' time as well as the 'valid' time of the DART array. The times are encoded in DART's time representation. Interpretation of these times by your model will be necessary and you will need to feed these times to your model in some automated fashion.
You will also need to create/modify a mkmf_dart_to_model and path_names_dart_to_model specific to your model.
Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. These routines are called in the advance_model.csh script. Every model is controlled differently, so writing detailed descriptions here is pointless.
After your model has advanced its states to the target time, it is necessary to convey the new state information to DART. The preferred name for this is model_to_dart.f90. This is fundamentally the inverse of dart_to_model.f90. Rip out the bits of the state vector you want to paste into a vector for DART, prepend the valid_time in the approved DART format and you're good to go. If you pack the bits into a DART state vector, there are native DART routines to write out the state vector. This ensures that DART will be able to read what you've written, and insulates you from having to worry about any internal file format changes we might make.