The DART distribution includes a full set of documentation. Once you download DART, you may view the documentation offline by opening the index.html file in the top-level DART directory. If you want to explore the documentation page without downloading DART, you may view the documentation for the Manhattan release.
There are two broad classes of models supported by DART. Some are 'low-order' models, generally single-threaded, subroutine-callable, and idealized: there are no real observations of these systems. The other class of models are 'high-order' models. There are real observations of these systems. Or at least, we like to think so ...
This is the 3-variable model as described in: Lorenz, E. N.
1963. Deterministic nonperiodic flow. J. Atmos.
Sci. 20, 130-141.
The system of equations is:
X' = -sigma*X + sigma*Y Y' = -XZ + rX - Y Z' = XY -bZ
This model is based on: Lorenz E. N., 1984: Irregularity:
A fundamental property of the atmosphere. Tellus,
36A, 98-110.
The system of equations is:
X' = -Y^2 - Z^2 - aX + aF Y' = XY - bXZ - Y + G Z' = bXY + XZ - Z
Where a, b, F, and G are the model parameters.
This model provides interesting off-attractor transients that behave something like gravity waves.
This is the model we use to become familiar with new
architectures, i.e., it is the one we use 'first'. It can be called
as a subroutine or as a separate executable. We can test this model
both single-threaded and mpi-enabled.
Quoting from the Lorenz 1998 paper:
... the authors introduce a model consisting of 40 ordinary differential equations, with the dependent variables representing values of some atmospheric quantity at 40 sites spaced equally about a latitude circle. The equations contain quadratic, linear, and constant terms representing advection, dissipation, and external forcing. Numerical integration indicates that small errors (differences between solutions) tend to double in about 2 days. Localized errors tend to spread eastward as they grow, encircling the globe after about 14 days.
...
We have chosen a model with J variables, denoted by X1, ..., XJ; in most of our experiments we have let J = 40. The governing equations are:
dXj/dt = (Xj+1 - Xj-2)Xj-1 - Xj + F (1)for j = 1, ..., J. To make Eq. (1) meaningful for all values of j we define X-1 = XJ-1, X0 = XJ, and XJ+1 = X1, so that the variables form a cyclic chain, and may be looked at as values of some unspecified scalar meteorological quantity, perhaps vorticity or temperature, at J equally spaced sites extending around a latitude circle. Nothing will simulate the atmosphere's latitudinal or vertical extent.
This is the Lorenz 96 2-scale model, documented in Lorenz (1995). It also has the option of the variant on the model from Smith (2001), which is invoked by setting local_y = .true. in the namelist. The time step, coupling, forcing, number of X variables, and the number of Ys per X are all specified in the namelist. Defaults are chosen depending on whether the Lorenz or Smith option is specified in the namelist. Lorenz is the default model. Interface written by Josh Hacker. Thanks Josh!
The forced_lorenz_96 model implements the standard L96 equations except that the forcing term, F, is added to the state vector and is assigned an independent value at each gridpoint. The result is a model that is twice as big as the standard L96 model. The forcing can be allowed to vary in time or can be held fixed so that the model looks like the standard L96 but with a state vector that includes the constant forcing term. An option is also included to add random noise to the forcing terms as part of the time tendency computation which can help in assimilation performance. If the random noise option is turned off (see namelist) the time tendency of the forcing terms is 0.
The reference for these models is Lorenz, E.N., 2005: Designing
chaotic models. J. Atmos.
Sci., 62, 1574-1587.
Model II is a single-scale model, similar to Lorenz 96, but with
spatial continuity in the waves. Model III is a two-scale model. It
is fudamentally different from the Lorenz 96 two-scale model
because of the spatial continuity and the fact that both scales are
projected onto a single variable of integration. The scale
separation is achived by a spatial filter and is therefore not
perfect (i.e. there is leakage). The slow scale in model III is
model II, and thus model II is a deficient form of model III. The
basic equations are documented in Lorenz (2005) and also in the
model_mod.f90 code. The user is free to choose model II or III with
a Namelist variable.
This model is on a periodic one-dimensional domain. A wind field is modeled using Burger's Equation with an upstream semi-lagrangian differencing. This diffusive numerical scheme is stable and forcing is provided by adding in random gaussian noise to each wind grid variable independently at each timestep. An Eulerian option with centered-in-space differencing is also provided. The Eulerian differencing is both numerically unstable and subject to shock formation. However, it can sometimes be made stable in assimilation mode (see recent work by Majda and collaborators).
This is a dynamical core for B-grid dynamics using the Held-Suarez forcing. The resolution is configurable, and the entire model can be run as a subroutine. Status: supported.
The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs. More people are using DART with WRF than any other model. Note: The actual WRF code is not distributed with DART. Status: supported.
Model Prediction Across Scales - atmosphere Status: active
Regional Ocean Modelling System Status: active
There are several supported versions of the Community Earth System Model (CESM) and its ancestors (CCSM4.0). Contact us for support for unreleased, developmental versions of CESM. Not all are supported because each requires modification of some subroutines and setup scripts in order to work with DART. The supported versions depend to some degree on the CESM component(s) which will be used as the assimilating model(s). See CAM-FV, POP, and CICE, below, and CESM setup guidelines. In general, later versions of CESM can build the latest component models plus any earlier versions of the component models. For example, CESM2.0 can build CAM-FV version 6, 5, 4, ... while CESM1.2.1 can build CAM-FV 5, 4, ..., but not 6. Note: the source code for CESM component models is not distributed with DART.
See CESM, above. CAM-FV has been the "work horse" atmospheric climate model for several generations of CESM releases. The Manhattan release of DART provides interfaces to CESM1.5 and CESM2.0. The setup scripts for those CESMs will currently build only the finite volume dynamical core of CAM. This works for all of the variants of CAM-FV; CAM-Chem, WACCM, WACCM-X. An interface between DART and the spectral element dy-core of CAM is available in DART Classic, and will be brought into the Manhattan release when needed. (CAM5); Some SourceMods and initial file ensembles for older and lower-resolution CAM-FVs are available in DART/CAM datasets Status: available for community use.
See CESM, above. The sea-ice component of CESM The interface of CESM-CICE to DART is through CESM1.5 which is still under development. Status: throroughly beta-tested, full support awaiting the CESM2.0 release.
See CESM, above. The Parallel Ocean Program (POP) was originally created by the Los Alamos National Laboratory and has been modified to run in the NCAR Community Earth System Model (CESM) framework. Additional modifications are necessary for data assimilation and center around the need to perform an adjustment upon restart to account for the fact that the input ocean state has been modified by the assimilation. There are interfaces for CESM1.1.1 and CESM1.2.1. Status: available for community use.
Cloud Model 1 (CM1) version 18 (CM1r18) is a non-hydrostatic
numerical model in Cartesian 3D coordinates designed for the study
of micro- to mesoscale atmospheric phenomena in idealized to
semi-idealized simulations. The CM1 model was developed and is
maintained by George Bryan at the National Center for Atmospheric
Research (NCAR) Mesoscale and Microscale Meteorology Laboratory
(MMM). The model code is freely available from the CM1 website:
http://www2.mmm.ucar.edu/people/bryan/cm1
and must be downloaded and compiled outside of DART.
This model interface and scripting support were created by Luke
Madaus. Thanks Luke!
The MIT ocean GCM version 'checkpoint59a' is the foundation of this implementation. It was modified by Ibrahim Hoteit (then of Scripps) to accomodate the interfaces needed by DART. Status: supported, and currently being ported to Manhattan.
The FMS AM2 model is GFDL's atmosphere-only code using observed sea surface temperatures, time-varying radiative forcings (including volcanos) and time-varying land cover type. This version of AM2 (also called AM2.1) uses the finite-volume dynamical core (Lin 2004). Robert Pincus (CIRES/NOAA ESRL PSD1) and Patrick Hoffman (NOAA) wrote the DART interface and are currently using the system for research. Note: the model code is not distributed with DART. Status: supported
The DART interface was originally written and supported by Tim Whitcomb. The following model description is taken from the COAMPS overview web page:
The Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS) has been developed by the Marine Meteorology Division (MMD) of the Naval Research Laboratory (NRL). The atmospheric components of COAMPS, described below, are used operationally by the U.S. Navy for short-term numerical weather prediction for various regions around the world.
Note: the model code is not distributed with DART. Status: supported
The Ikeda model is a 2D chaotic map useful for visualization data assimilation updating directly in state space. There are three parameters: a, b, and mu. The state is 2D, x = [X Y]. The equations are:
X(i+1) = 1 + mu * ( X(i) * cos( t ) - Y(i) * sin( t ) ) Y(i+1) = mu * ( X(i) * sin( t ) + Y(i) * cos( t ) ),
where
t = a - b / ( X(i)**2 + Y(i)**2 + 1 )
Note the system is time-discrete already, meaning there is no delta_t. The system stems from nonlinear optics (Ikeda 1979, Optics Communications). Interface written by Greg Lawson. Thanks Greg!
This model provides very simple models for evaluating filtering algorithms. It can provide simple linear growth around a fixed point, a random draw from a Gaussian, or combinations of the two.
The PBL model is a single column version of the WRF model. The functionality for this has been folded into the regular WRF model_mod interface so this version is no longer needed. See the WRF model_mod namelist documentation for how to use the single-column features. Status: orphaned, obsolete.
The MITgcm annulus model as configured for this application within DART is a non-hydrostatic, rigid lid, C-grid, primitive equation model utilizing a cylindrical coordinate system. For detailed information about the MITgcm, see http://mitgcm.org Status: orphaned.
Status: orphaned.
Status: orphaned.
This model is a 2-layer, isentropic, primitive equation model on a sphere. Status: orphaned.
The rose model is for the stratosphere-mesosphere and was used by Tomoko Matsuo (now at CU-Boulder) for research in the assimilation of observations of the Mesosphere Lower-Thermosphere (MLT). Note: the model code is not distributed with DART. Status: orphaned.
The code distribution was getting cluttered with datasets,
boundary conditions, intial conditions, ... large files that were
not necessarily interesting to all people who downloaded the DART
code. This is compounded by the fact subversion makes a local
(hidden) copy of the original repository contents, so the penalty
for being large is doubled. It just made sense to make all the
large files available on as 'as-needed' basis.
To keep the size of the DART distribution down we have a separate
www-site to provide some observation sequences, initial conditions,
and general datasets. It is our intent to populate this site with
some 'verification' results, i.e. assimilations that were known to
be 'good' and that should be fairly reproducible - appropriate to
test the DART installation.
Please be patient as I make time to populate this directory. (yes,
'make', all my 'found' time is taken ...)
Observation sequences can be found at http://www.image.ucar.edu/pub/DART/Obs_sets.
Useful
bits for CAM can be found at http://www.image.ucar.edu/pub/DART/CAM.
Useful bits for WRF can be found at http://www.image.ucar.edu/pub/DART/WRF.
Useful bits for MPAS_ocn can be found at http://www.image.ucar.edu/pub/DART/MPAS_OCN.
Useful bits for CICE can be found at http://www.image.ucar.edu/pub/DART/CICE.
Verification experiments will be posted to http://www.image.ucar.edu/pub/DART/VerificationData
as soon as I can get to it. These experiments will consist of
initial conditions files for testing different high-order models
like CAM, WRF, POP ... The low-order models are still distributed
with verification data in their work
directories.
The idea is to generate an ensemble that has sufficient 'spread'
to cover the range of possible solutions. Insufficient spread can
(and usually will) lead to poor assimilations. Think 'filter
divergence'.
Generating an ensemble of initial conditions can be done in lots of
ways, only a couple of which will be discussed here. The first is
to generate a single initial condition and let DART perturb it with
noise of a nature you specify to generate as many ensemble members
as you like. The second is to take some existing collection of
model states and convert them to DART initial conditions files and
then use the NCO operators
to set the proper date in the files. The hard part is then coming
up with the original collection of model state(s).
This method works well for some models, and fails miserably for others. As it stands, DART supplies a routine that can add gaussian noise to every element of a state vector. This can cause some models to be numerically unstable. You can supply your own model_mod:pert_model_copies() if you want a more sophisticated perturbation scheme.
Simply collect the filenames of all the model netCDF files - one
per ensemble member - and specify them through the input.nml &filter_nml:input_state_file_list = "restarts_in.txt"
Frequently, the initial ensemble of restart files is some
climatological collection. For CAM experiments, we usually start
with N different 'January 1' states ... from N
different years. The timestamp in those files can be ignored
through namelist control. Experience has shown that it takes less
than a week of assimilating 4x/day to achieve a steady ensemble
spread. WRF has its own method of generating an initial ensemble.
For that, it is best to go to contact someone familiar with
WRF/DART.
In general, there are 'restart files' for the low-order models
that already exist as ASCII sources for netCDF files. These files
are usually called work/filter_input.cdl and
can be converted to netCDF files by using the ncgen -o unix command.
You can generate your own ensemble by adding noise to a single
perfect_input.nc file and run filter. At present (27 March 2017), the way to
specify the input state file is to use the input_state_file_list mechanism. Simply put the name of
the file into the file referenced by input_state_file_list. In this example, filter_input_list.txt would contain exactly one line -
the string "perfect_input.nc". You will also need an observation
sequence file, and you may want to explicitly state the start/stop
times.
&filter_nml ... input_state_files = 'null', input_state_file_list = 'filter_input_list.txt' perturb_from_single_instance = .true. ens_size = [whatever you want] init_time_days = 0 init_time_seconds = 0 first_obs_days = 0 first_obs_seconds = 0 last_obs_days = 0 last_obs_seconds = 0 output_state_files = 'null', output_state_file_list = 'filter_output_list.txt' ...
In this example, the ensemble will be created with whatever file name you put in filter_output_list.txt.
Once a model is compatible with the DART facility all of the
functionality of DART is available. This includes 'perfect model'
experiments (also called Observing System Simulation Experiments -
OSSEs). Essentially, the model is run forward from a known state
and, at predefined times, an observation forward operator is
applied to the model state to harvest synthetic observations. This
model trajectory is known as the 'true state'. The synthetic
observations are then used in an assimilation experiment. The
assimilation performance can then be evaluated precisely because
the true state (of the model) is known. Since the same forward
operator is used to harvest the synthetic observations as well as
during the assimilation, the 'representativeness' error of the
assimilation system is not an issue.
The example described in this section uses low-order models, but
the logic and procedure is exactly the same for
high-order models; the complication is usually that researchers
want more sophisticated observation networks than those described
here. All you have to do is use the DART tools to create an
observation sequence file (even a REAL observation sequence file),
and use that instead of creating one by hand with create_obs_sequence and create_fixed_network_sequence. perfect_model_obs will simply ignore the actual
observation values in this case and only use the observation
metadata. Take care that the observation error values in the file
are appropriate for an OSSE - the converters usually assume some
sort of representativeness error in the observation error
specification.
There are a set of MATLAB® functions to help explore the
assimilation performance in state-space as well as in
observation-space. Exploring the results of a Lorenz
'96 OSSE.
There are four fundamental steps to running an OSSE from within DART:
There are lots of ways to define an observation sequence that
DART can use as input for a perfect model experiment. If you have
observations in DART format already, you can simply use them. If
you have observations in one of the formats already supported by
the DART converters (check
DART/observations/obs_converters/observations.html), convert it
to a DART observation sequence. You may need to use the
obs_sequence_tool to combine multiple observation sequence
files into observation sequence files for the perfect model
experiment. Any existing observation values and quality control
information will be ignored by perfect_model_obs - only the time and location
information are used. In fact, any and all existing observation and
QC values will be removed.
GENERAL COMMENT ABOUT THE INTERPLAY BETWEEN THE MODEL STOP/START
FREQUENCY AND THE IMPACT ON THE OBSERVATION FREQUENCY: There is
usually a very real difference between the dynamical timestep of
the model and when it is safe to stop and restart the model. The
assimilation window is (usually) required to be a multiple of the
safe stop/start frequency. For example, an atmospheric model may
have a dynamical timestep of a few seconds, but may be constrained
such that it is only possible to stop/restart every hour. In this
case, the assimilation window is a multiple of 3600 seconds. Trying
to get observations at a finer timescale is not possible, we only
have access to the model state when the model stops.
If you do not have an input observation sequence, it is simple to create one.
Both create_obs_sequence and create_fixed_network_seq interactively prompt you for the information they require. This can be quite tedious if you want a spatially dense set of observations. People have been known to actually write programs to generate the input to create_obs_sequence and simply pipe or redirect the information into the program. There are several examples of these in the models/bgrid_solo directory: column_rand.f90, id_set_def_stdin.f90, ps_id_stdin.f90, and ps_rand_local.f90 . Be advised that some observation types have different input requirements, so a 'one size fits all' program is a waste of time.
NOTE: only the observation kinds in the input.nml &obs_kind_nml:assimilate_these_obs_types,evaluate_these_obs will be available to the create_obs_sequence program.
DEVELOPERS TIP: You can specify 'identity' observations as input to perfect_model_obs. Identity observations are the model values AT the exact gridcell location, there is no interpolation at all. Just a straight table-lookup. This can be useful as you develop your model interfaces; you can test many of the routines and scripts without having a working model_interpolate().
More information about creating observation sequence files for OSSE's is available in the Synthetic Observations section.
perfect_model_obs reads the blueprint and an initial state and applies the appropriate forward observation operator for each and every observation in the current 'assimilation window'. If necessary, the model is advanced until the next set of observations is desired. When it has run out of observations or reached the stop time defined by the namelist control, the program stops and writes out restarts, diagnostics, observation sequences, and a log file. This is fundamentally a single deterministic forecast for 'as long as it takes' to harvest all the observations.
default filename | format | contents |
---|---|---|
perfect_input.nc | netCDF | The DART model state to start from. If the variables have a time dimension, The last timestep will be used as the starting point. |
perfect_output.nc | netCDF | The DART model state at every assimilation
timestep. This file has but one 'copy' - the truth. Dump the
metadata and the time: ncdump -v time,MemberMetadata perfect_output.nc |
obs_seq.out | ASCII or binary DART-specific linked list |
This file has the observations - the result of the forward observation operator. This observation sequence file has two 'copies' of the observation: the noisy 'copy' and the noise-free 'copy'. The noisy copy is designated as the 'observation', the noise-free copy is the truth. The observation-space diagnostic program obs_diag has special options for using the true copy instead of the observation copy. See the obs_diag.html for details. |
dart_log.out | ASCII | The run-time output of perfect_model_obs . |
Each model may define the assimilation window differently, but conceptually, all the observations plus or minus half the assimilation window are considered to be simultaneous and a single model state provides the basis for all those observations. For example: if the blueprint requires temperature observations every 30 seconds, the initial model time is noon (12:00) and the assimilation window is 1 hour; all the observations from 11:30 to 12:30 will use the same state as input for the forward observation operator. The fact that you have a blueprint for observations every 30 seconds means a lot of those observations may have the same value (if they are in the same location).
perfect_model_obs uses the input.nml for its control. A subset of the namelists and variables of particular interest for perfect_model_obs are summarized here. Each namelist is fully described by the corresponding module document.
&perfect_model_obs_nml <--- link to the full namelist description! ... read_input_state_from_file = .true. some models can start from preset ICs single_file_in = .true some models have nested domains ... input_state_files = 'perfect_input.nc' list of files ... for each domain write_output_state_to_file = .true. single_file_out = .true. output_state_files = 'perfect_output.nc' the time-evolution of the true state async = 0 totally depends on the model adv_ens_command = './advance_ens.csh' depends on the model obs_seq_in_file_name = 'obs_seq.in' obs_seq_out_file_name = 'obs_seq.out' init_time_days = -1 negative means use the time in ... init_time_seconds = -1 the 'restart_in_file_name' file first_obs_days = -1 negative means start at the first time in ... first_obs_seconds = -1 the 'obs_seq_in_file_name' file. last_obs_days = -1 negative means to stop with the last ... last_obs_seconds = -1 observation in the file. / &obs_sequence_nml write_binary_obs_sequence = .false. .false. will create ASCII - easy to check. / &obs_kind_nml ... assimilate_these_obs_types = 'RADIOSONDE_TEMPERATURE', ... list all the synthetic observation ... types you want / &model_nml ... time_step_days = 0, some models call this 'assimilation_period_days' time_step_seconds = 3600 some models call this 'assimilation_period_seconds' use whatever value you want / &utilities_nml ... termlevel = 1 your choice logfilename = 'dart_log.out' your choice /
Since perfect_model_obs generally requires advancing the model, and the model may use MPI or require special ancillary files or forcing files or ..., it is not possible to provide a single example that will cover all possibilities. The subroutine-callable models (i.e. the low-order models) can run perfect_model_obs very simply:
This step is done with the program
filter, which also uses input.nml for
input and run-time control. A successful assimilation will depend
on many things: an approprite initial ensemble, monitoring and
perhaps correcting the ensemble spread, localization, etc. It is
simply not possible to design a one-size-fits-all system that will
work for all cases. It is critically important to
analyze the results of the assimilation and explore ways of making
the assimilation more effective. The DART tutorial
and the DART_LAB
exercises are an invaluable resource to learn and understand how to
determine the effectiveness of, and improve upon, an assimilation
experiment. The concepts learned with the low-order models are
directly applicable to the most complicated models.
It is important to remember that if filter 'terminates normally', it does not
necessarily mean the assimilation was effective!
The Manhattan release of DART allows for a very high degree of
customization when it comes to output. To stay focused on the
concepts, I will restrict the examples to models that have
single_file_in = .true., single_file_out = .true., and. stages_to_write = 'preassim', 'output'.
filter generally produces at least two
state-space output diagnostic files (preassim.nc and filter_output.nc)
which contains values of the ensemble mean, ensemble spread,
perhaps the inflation values, and (optionally) ensemble members for
the duration of the experiment. filter
also creates an observation sequence file that contains the input
observation information as well as the prior and posterior ensemble
mean estimates of that observation, the prior and posterior
ensemble spread for that observation, and (optionally), the actual
prior and posterior ensemble estimates of that observation. Rather
than replicate the observation metadata for each of these, the
single metadata is shared for all these 'copies' of the
observation. See An overview
of the observation sequence for more detail. filter also produces a run-time log file that can
greatly aid in determining what went wrong if the program
terminates abnormally.
A very short description of some of the most important namelist
variables is presented here. Basically, I am only discussing the
settings necessary to get filter to run. I
can guarantee these settings WILL NOT generate the BEST
assimilation. Again, see the module documentation for a full
description of each namelist.
&filter_nml <--- link to the full namelist description! async = 0 ens_size = 40 something ≥ 20, please num_output_state_members = 40 # of FULL DART model states to put in state-space output files num_output_obs_members = 40 # of ensemble member estimates of observation to save obs_sequence_in_name = 'obs_seq.out' output from perfect_model_obs obs_sequence_out_name = 'obs_seq.final' init_time_days = -1 the time in the restart file is correct init_time_seconds = -1 first_obs_days = -1 same interpretation as with perfect_model_obs first_obs_seconds = -1 last_obs_days = -1 same interpretation as with perfect_model_obs last_obs_seconds = -1 outlier_threshold = 3.0 Observation rejection criterion! single_file_in = .true. input_state_file_list = 'filter_input_list.txt' file containing the list of input files - 1 per domain stages_to_write = 'preassim', 'output' single_file_out = .true. output_state_file_list = 'filter_output_list.txt' file containing the list of (desired) output files - 1 per domain write_all_stages_at_end = .false. inf_flavor = 0, 0 0 is 'do not inflate' ... / &assim_tools_nml filter_kind = 1 1 is EAKF, 2 is EnKF ... cutoff = 0.2 this is your localization - units depend on type of 'location_mod' / &obs_kind_nml assimilate_these_obs_types = 'RAW_STATE_VARIABLE' Again, use a list ... appropriate for your model / &model_nml assimilation_perior_days = 0 the assimilation interval is up to you assimilation_perior_seconds = 3600 /
Once the namelist is set, execute filter to integrate the ensemble forward with the final ensemble state written to the files in filter_output_list.txt. For the low-order models and bgrid_solo (i.e. the models that can be run with single_file_in = .true. and single_file_out = .true.) the default filenames will be preassim.nc and filter_output.nc and will contain values for 40 ensemble members once a day.
mpirun ./filter -OR- mpirun.lsf ./filter -OR- ./filter -OR- however YOU run filter on your system!
All the concepts of spread, rmse, rank histograms that were taught in the DART tutorial and in DART_LAB should be applied now. Try the techniques described in the Did my experiment work? section. The 'big three' state-space diagnostics are repeated here because they are so important. The first two require the perfect_output.nc .
plot_bins.m | plots the rank histograms for a set of state variables. This requires you to have all or most of the ensemble members available in the preassim.nc or filter_output.nc files. |
plot_total_err.m | plots the evolution of the error (un-normalized) and ensemble spread of all state variables. |
plot_ens_mean_time_series.m | plots the evolution of a set of state variables - just the ensemble mean (and Truth, if available). plot_ens_time_series.m is actually a better choice if you can afford to write all/most of the ensemble members to the preassim.nc and filter_output.nc files. |
DART is designed to work with many models without modifications to the DART routines or
the model source code. DART can 'wrap around' your model in two
ways. One can be used if your model can be called as a subroutine,
the other is for models that are separate executables. Either way,
there are some steps that are common to both paths.
Please be aware that several of the high-order models (CAM and WRF,
in particular) have been used for years and their scripts have
incorporated failsafe procedures and restart capabilities that have
proven to be useful but make the scripts complex - more complex
than need be for the initial attempts. Truly, some of the
complexity is no longer required for available platforms. Then
again, we're not running one instance of a highly complicated
computer model, we're running N of them.
NEW The DART Manhattan release provides native
netCDF read/write support. Consequently, there is no need for
translation routines that we have traditionally been calling
model_to_dart or dart_to_model. If, however, your model does not use
netCDF for I/O, these programs must be written. We have a lot of
experience writing these converters - you should not be afraid to
ask for advice or for code to start from.
NEW Manhattan provides a program to help test
the required interfaces:
assimilation_code/programs/model_mod_check/model_mod_check.f90.
Many models start with this and modify it to suit their needs. Be
aware that some of the model-specific model_mod_check.f90 programs use deprecated features.
Focus on the ones for Manhattan-compliant models.
#1 Don't shoot the messenger. We have a lot of experience trying to write portable/reproducible code and offer these suggestions. All of these suggestions are for the standalone DART components. We are not asking you to rewrite your model. If your model is a separate executable, leaving it untouched is fine. Writing portable code for the DART components will allow us to include your model in the nightly builds and reduces the risk of us making changes that adversely affect the integration with your model. There are some routines that have to play with the core DART routines, these are the ones we are asking you to write using these few simple guidelines.
Hopefully, you have no idea how difficult it is to build each model with 'unique' compile options on N different platforms. Fortran90 provides a nice mechanism to specify the type of variable, please do not use vendor-specific extensions. (To globally autopromote 32bit reals to 64bit reals, for example. That is a horrible thing to do, since vendors are not consistent about what happens to explicitly-typed variables. Trust me. They lie. It also defeats the generic procedure interfaces that are designed to use a single interface as a front-end to multiple 'type-specific' routines.) Compilers do abide by the standard, however, so DART code looks like:
character(len=8) :: crdate integer, dimension(8) :: values ... real(r4) :: a,b real(r8) :: bob integer :: istatus, itype ... real(r8), intent(in) :: x(:) type(location_type), intent(in) :: location integer, intent(in) :: itype integer, intent(out) :: istatus real(r8), intent(out) :: obs_val
depending on the use. The r4 and r8 types are explicitly defined in assimilation_code/modules/utilities/types_mod.f90 to accurately represent what we have come to expect from 32bit and 64bit floating point real variables, respectively. If you like, you can redefine r8 to be the same as r4 to shrink your memory requirement. The people who run with WRF frequently do this. Do not redefine the digits12 parameter, that one must provide 64bit precision, and is used in precious few places.
If your model is a separate executable, it would be wise to look at the heavily commented template script models/template/shell_scripts/advance_model.csh and then a few higher-order models to see how they do it. Become familiar with DART's use of MPI, the options for parallelism, and the filter namelist parameter async.
A little explanation/motivation is warranted. If the model uses the standard layout, it is much easier to include the model in the nightly builds and testing. For this reason alone, please try to use the recommended directory layout. Simply looking at the DART/models directory should give you a pretty good idea of how things should be laid out. Copy the template directory and its contents. It is best to remove the (hidden) subversion files to keep the directory 'clean'. The point of copying this directory is to get a model_mod.f90 that works as-is and you can modify/debug the routines one at a time.
~/DART/models % cp -r template mymodel ~/DART/models % find mymodel -name .svn -print mymodel/.svn mymodel/shell_scripts/.svn mymodel/work/.svn ~/DART/models % rm -rf `find mymodel -name .svn -print` ~/DART/models % find mymodel -name .svn -print ~/DART/models %
The destination directory (your model directory) should be in
the DART/models directory to keep life
simple. Moving them around will cause problems for the work/mkmf_xxxxx configuration
files. Each model directory should have a work and shell_scripts
directories, and may have a matlab directory, a src
directory, or anything else you may find convenient.
Now, you must change all the work/path_names_xxx file
contents to reflect the location of your model_mod.f90.
We have templates, examples, and a document describing the
required interfaces in the DART code tree - DART/models/template/model_mod.html.
Every(?) user-visible DART program/module is intended to have a
matching piece of documentation that is distributed along with the
code. The DART code tree always has the most current
documentation.
Check out time_manager_mod.f90 and utilities_mod.f90 for general-purpose routines ...
Use Fortran namelists for I/O if possible.
Modify the model_mod.f90 file to return
specifics about your model. This module MUST contain all the
required interfaces (no surprise) but it can also contain many more
interfaces as is convenient. This module should be written with the
understanding that print statements and error terminations will be
executed by multiple processors/tasks. To restrict print statements
to be written once (by the master task), it is necessary to preface
the print as in this example:
if (do_output()) write(*,*)'model_mod:namelist
cal_NML',startDate_1,startDate_2
No matter the complexity of the model, the DART software
requires a few interface routines in a model-specific Fortran90
module model_mod.f90 file. The models/template/model_mod.f90 file has extended comment
blocks at the heads of each of these routines that go into much
more detail for what is to be provided. You cannot change
the types or number of required arguments to any of the required
interface routines. You can add optional arguments, but
you cannot go back throught the DART tree to change the gazillion
calls to the mandatory routines. It is absolutely appropriate to
look at existing models to get ideas about how to implement the
interfaces. Finding a model implementation that is functionally
close to yours always helps.
The table of the mandatory interfaces and expected programming
degree-of-difficulty is:
subroutine callable | separate executable | routine | description |
---|---|---|---|
easy | easy | get_model_size | This function returns the size of all the model variables (prognostic or diagnosed or ...) that are packed into the 1D DART state vector. That is, it returns the length of the DART state vector as a single scalar integer. |
depends | trivial | adv_1step | For subroutine-callable models, this routine is the one to actually advance the model 1 timestep (see models/bgrid_solo/model_mod.f90 for an example). For non-subroutine-callable models, this is a NULL interface. Easy. |
depends | depends | get_state_meta_data | This routine takes as input an integer into the DART state vector and returns the associated location and (optionally) variable type from obs_kind/obs_kind_mod.f90. (See models/*/model_mod.f90 for examples.) Since DART uses netCDF and is responsible for the storage order, this is generally pretty easy. |
easy | easy | shortest_time_between_assimilations | This routine returns the smallest increment in time (in seconds) that the model is capable of advancing the state in a given implementation. For example, the dynamical timestep of a model is 20 minutes, but there are reasons you don't want to (or cannot) restart at this interval and would like to restart AT MOST every 6 hours. For this case, shortest_time_between_assimilations should return 21600, i.e. 6*60*60. This is also interpreted as the nominal assimilation period. This interface is required for all applications. |
easy | easy | end_model | Performs any shutdown and cleanup needed. Good form would dictate that you should deallocate any storage allocated when you instantiated the model (from static_init_model, for example). |
depends | depends | static_init_model | Called to do one-time initialization of the model. This generally includes setting the grid information, calendar, etc. |
trivial | trivial | init_time | Returns a time that is somehow appropriate for starting up a long integration of the model IFF the &perfect_model_obs_nml namelist parameter read_input_state_from_file = .false. If this option is not to be used in perfect_model_obs, this can be a routine that simply throws a fatal error. |
easy | easy | init_conditions | Companion interface to init_time. Returns a model state vector that is somehow appropriate for starting up a long integration of the model. Only needed IFF the &perfect_model_obs_nml namelist parameter read_input_state_from_file = .false. |
trivial - difficult | trivial - difficult | nc_write_model_atts | This routine is used to write the model-specific attributes to netCDF files created by DART. If you are simply updating existing (template) netCDF files, this routine is very easy. The subroutine in the models/template/model_mod.f90 WILL WORK for new models but does not know anything about prognostic variables or geometry or ... Still, it is enough to get started without doing anything. More meaningful coordinate variables etc. are needed to supplant the default template. This can be as complicated as you like - see existing models for examples. |
trivial - medium | trivial - medium | nc_write_model_vars | This routine is currently unused but anticipated for future enhancements. The default routine in default_model_mod should be referenced. |
trivial | trivial | get_close_obs | This is the routine that takes a single observation location and a list of other observation locations, returns the indices of all observation locations close to the single observation along with the number and the distances for the close ones. This is generally a 'pass-through' routine to a routine of the same name in the location module. |
trivial | trivial | get_close_state | This is the routine that takes a single observation location and a list of state locations, returns the indices of all the state locations close to the observation as well as the number and the distances for the close ones. This is generally a 'pass-through' routine to a routine of the same name in the location module. |
depends | hard | model_interpolate | This is one of the more difficult routines. Given a DART state vector, a location, and a desired generic 'quantity' (like QTY_SURFACE_PRESSURE, QTY_TEMPERATURE, QTY_SPECIFIC_HUMIDITY, QTY_PRESSURE, ... ); return the desired scalar quantity and set the return status accordingly. This is what enables the model to use observation-specific 'forward operators' that are part of the common DART code. |
depends | trivial - difficult | convert_vertical_obs | If needed, the difficulty lies in the complexity of the model vertical coordinate system. |
depends | trivial - difficult | convert_vertical_state | If needed, the difficulty lies in the complexity of the model vertical coordinate system. |
depends | trivial - difficult | pert_model_copies | This routine is used to generate initial ensembles. This may be a NULL interface if you can tolerate the default perturbation strategy of adding noise to every state element or if you generate your own ensembles outside the DART framework. There are other ways of generating ensembles ... climatological distributions, bred singular vectors, voodoo ... |
easy | easy | read_model_time | This routine simply stores a copy of the ensemble mean of the state vector within the model_mod. The ensemble mean may be needed for some calculations (like converting model sigma levels to the units of the observation - pressure levels, for example). |
easy | easy | write_model_time | This routine simply stores a copy of the ensemble mean of the state vector within the model_mod. The ensemble mean may be needed for some calculations (like converting model sigma levels to the units of the observation - pressure levels, for example). |
The normal sequence of events is that DART reads in its own
restart file (do not worry about where this comes from right now)
and eventually determines it needs to advance the model. DART needs
to be able to take its internal representation of each model state
vector, the valid time of that state, and the amount of time to
advance the state - and communicate that to the model. When the
model has advanced the state to the requested time, the output must
be ingested by DART and the cycle begins again. DART is entirely
responsible for reading the observations and there are several
programs for creating and manipulating the observation sequence
files.
There are a couple of ways to exploit parallel architectures with
DART, and these have an immediate bearing on the design of the
script(s) that control how the model instances (each model
copy) are advanced. Perhaps the conceptually simplest
method is when each model instance is advanced by a single
processor element. DART calls this async = 2. It is generally efficient to
relate the ensemble size to the number of processors being
used.
The alternative is to advance every model instance one after
another using all available processors for each instance of the
model. DART calls this async = 4,
and requires an additional script. For portability reasons, DART
uses the same processor set for both the assimilation and the model
advances. For example, if you advance the model with 96 processors,
all 96 processors will be employed to assimilate. If your model
requires 2000 processors, all 2000 will be employed for the
assimilation. Some people exploit the queueing systems on their
large machines to allow for the explicit customization of how many
tasks are used for each model advance and for an assimilation.
advance_model.csh is invoked in one of two
ways: 1) if async = 2 then
filter uses a system() call, or 2) if async = 4 then run_filter.csh makes the call. Either way there are
three arguments.
Modify shell_scripts/advance_model.csh to:
During this initial phase, it may be useful to _leave_ the temporary directory intact until you verify everything is as it should be.
In addition to modifying shell_scripts/advance_model.csh as described above, you must also modify shell_scripts/run_filter.csh in the following way: THIS PART NEEDS TO BE FILLED IN
Generally testing when you add a new model to DART includes:
Checking the converter programs.
Checking the model advance control.
Starting with one observation in a known location, with a known
value and error specification.
Performing a 'perfect model' experiment for a few timesteps.
Looking at what the assimilation did with:
work % ncdiff filter_output.nc preassim.nc Innov.nc work % ncview Innov.nc
Only needed for state-space diagnostics.
Define a structure with required elements.
Examples exist in the diagnostics/matlab/private directory.
Modify the code to your heart's content.
'svn revert' cures all ...
Please let us know if you have suggestions or code to contribute to DART. We're a small group, but we are willing to listen and will make every endeavor to incorporate improvements to the code. Email us at dart@ucar.edu.
The DART system includes several different types of Kalman filter algorithms, selectable at runtime by a namelist setting. See the documentation for the assimilation tools module for descriptions of the various filter types and how to select them.
Many DART programs have namelists to specify run-time control.
Some programs use one or more modules - each module may have its
own namelist. As a consequence, we find it convenient to have one
file (called input.nml) specifying all the
namelists.
For a complete list of namelists, see the following document:
master
list of namelists
An example namelist for each program is automatically built when
the makefile is generated by mkmf_xxxxx.
The example namelist is named input.nml.xxxxx_default where xxxxx is the name of the program. The example namelists
have default values, which may not be appropriate for your use. The
default input.nml in each work directory generally has better values. As usual,
the documentation for each module is the best place to get
information about the namelist settings.