DART documentation:
The DART distribution includes a full set of documentation.
Once you download DART, you may view the documentation offline by
opening the index.html file in the toplevel
DART directory.
If you want to view the main documentation page without downloading
DART, you may view the documentation for the Lanai release at
https://svndaresdart.cgd.ucar.edu/DART/releases/Lanai/index.html
An incomplete list of DARTsupported models:
There are two broad classes of models supported by DART. Some are 'loworder' models,
generally singlethreaded, subroutinecallable, and idealized: there are no real observations
of these systems.
The other class of models are 'highorder' models. There are real observations of
these systems. Or at least, we like to think so ...
The 'loworder' models supported in DART.
ikeda
The Ikeda model is a 2D chaotic map useful for visualization
data assimilation updating directly in state space.
There are three parameters: a, b, and mu. The state is 2D, x = [X Y].
The equations are:
X(i+1) = 1 + mu * ( X(i) * cos( t )  Y(i) * sin( t ) )
Y(i+1) = mu * ( X(i) * sin( t ) + Y(i) * cos( t ) ),
where
t = a  b / ( X(i)**2 + Y(i)**2 + 1 )
Note the system is timediscrete already, meaning there is no delta_t. The
system stems from nonlinear optics (Ikeda 1979, Optics Communications).
Interface written by Greg Lawson. Thanks Greg!
lorenz_63
This is the 3variable model as described in:
Lorenz, E. N. 1963. Deterministic nonperiodic flow.
J. Atmos. Sci. 20, 130141.
The system of equations is:
X' = sigma*X + sigma*Y
Y' = XZ + rX  Y
Z' = XY bZ
lorenz_84
This model is based on:
Lorenz E. N., 1984: Irregularity: A fundamental property of the atmosphere.
Tellus, 36A, 98110.
The system of equations is:
X' = Y^2  Z^2  aX + aF
Y' = XY  bXZ  Y + G
Z' = bXY + XZ  Z
Where a, b, F, and G are the model parameters.
9var
This model provides interesting offattractor transients
that behave something like gravity waves.
lorenz_96
This is the model we use to become familiar with new architectures, i.e.,
it is the one we use 'first'. It can be called as a subroutine or as a separate
executable. We can test this model both singlethreaded and mpienabled.
Quoting from the Lorenz 1998 paper:
... the authors introduce a model consisting of 40 ordinary differential
equations, with the dependent variables representing values of some
atmospheric quantity at 40 sites spaced equally about a latitude circle. The
equations contain quadratic, linear, and constant terms representing
advection, dissipation, and external forcing. Numerical integration
indicates that small errors (differences between solutions) tend to double
in about 2 days. Localized errors tend to spread eastward as they grow,
encircling the globe after about 14 days.
...
We have chosen a model with J variables, denoted by X_{1}, ..., X_{J};
in most of our experiments we have let J = 40. The governing equations are:
dX_{j}/dt = (X_{j+1}  X_{j2})X_{j1}  X_{j} + F (1)
for j = 1, ..., J. To make Eq. (1) meaningful for all
values of j we define X_{1} = X_{J1},
X_{0} = X_{J}, and X_{J+1} = X_{1}, so that the
variables form a cyclic chain, and may be looked at as values of some
unspecified scalar meteorological quantity, perhaps vorticity or
temperature, at J equally spaced sites extending around a latitude circle.
Nothing will simulate the atmosphere's latitudinal or vertical extent.
forced_lorenz_96
The forced_lorenz_96 model implements the standard L96 equations except
that the forcing term, F, is added to the state vector and is
assigned an independent value at each gridpoint. The result is a
model that is twice as big as the standard L96 model. The forcing
can be allowed to vary in time or can be held fixed so that the
model looks like the standard L96 but with a state vector that
includes the constant forcing term. An option is also included to
add random noise to the forcing terms as part of the time
tendency computation which can help in assimilation performance.
If the random noise option is turned off (see namelist) the
time tendency of the forcing terms is 0.
lorenz_96_2scale
This is the Lorenz 96 2scale model, documented in Lorenz (1995).
It also has the option of the variant on the model from Smith (2001),
which is invoked by setting local_y = .true. in the
namelist. The time step, coupling, forcing, number of X variables,
and the number of Ys per X are all specified in the namelist.
Defaults are chosen depending on whether the Lorenz or Smith option is
specified in the namelist. Lorenz is the default model.
Interface written by Josh Hacker. Thanks Josh!
lorenz_04
The reference for these models is Lorenz, E.N., 2005: Designing
chaotic models. J. Atmos. Sci., 62, 15741587.
Model II is a singlescale model, similar to Lorenz 96, but with
spatial continuity in the waves. Model III is a twoscale
model. It is fudamentally different from the Lorenz 96 twoscale
model because of the spatial continuity and the fact that both scales
are projected onto a single variable of integration. The scale
separation is achived by a spatial filter and is therefore not perfect
(i.e. there is leakage). The slow scale in model III is model II,
and thus model II is a deficient form of model III. The basic
equations are documented in Lorenz (2005) and also in the model_mod.f90
code. The user is free to choose model II or III with a Namelist
variable.
simple_advection
This model is on a periodic onedimensional domain. A wind field is
modeled using Burger's Equation with an upstream semilagrangian
differencing. This diffusive numerical scheme is stable and forcing
is provided by adding in random gaussian noise to each wind grid
variable independently at each timestep.
An Eulerian option with centeredinspace differencing is also
provided. The Eulerian differencing is both numerically unstable and
subject to shock formation. However, it can sometimes be made stable in
assimilation mode (see recent work by Majda and collaborators).
The 'highorder' models supported in DART.
In roughly the order they were supported by DART.
bgrid_solo
This is a dynamical core for Bgrid dynamics using the
HeldSuarez forcing. The resolution is configurable, and the
entire model can be run as a subroutine.
Status: supported.
pe2lyr
This model is a 2layer, isentropic, primitive equation model on a sphere.
Status: orphaned.
wrf
The Weather Research and Forecasting (WRF) Model
is a nextgeneration mesoscale numerical weather prediction
system designed to serve both operational forecasting and
atmospheric research needs. More people are using DART with
WRF than any other model. Note: The actual WRF code
is not distributed with DART.
Status: supported.
There are many DARTsupported versions of CAM.
The frozen version of the Community Climate System Model
(CCSM4.0) uses the Community Atmosphere Model
(CAM4).
The Community Earth System Model (CESM 1.0)
uses the Community Atmosphere Model (CAM5);
the latest in a series of global atmosphere models developed at NCAR
for the weather and climate research communities.
Status: both are supported (as are some earlier releases).
PBL_1d
The PBL model is a single column version of the WRF model. In this instance,
the necessary portions of the WRF code are distributed with DART.
Status: supported  but looking to be adopted.
MITgcm_annulus
The MITgcm annulus model as configured for this application
within DART is a nonhydrostatic, rigid lid, Cgrid, primitive
equation model utilizing a cylindrical coordinate system. For
detailed information about the MITgcm, see http://mitgcm.org
Status: orphaned  and looking to be adopted.
rose
The rose model is for the stratospheremesosphere and
was used by Tomoko Matsuo (now at CUBoulder and NOAA)
for research in the assimilation of observations of the
Mesosphere LowerThermosphere (MLT).
Note: the model code is not distributed with DART.
Status: orphaned
MITgcm_ocean
The MIT ocean GCM version 'checkpoint59a'
is the foundation of this implementation. It was modified by Ibrahim Hoteit
(then of Scripps) to accomodate the interfaces needed by DART.
Status: supported  but looking to be adopted.
am2
The FMS AM2
model is GFDL's atmosphereonly code using observed sea surface temperatures,
timevarying radiative forcings (including volcanos) and timevarying land cover type.
This version of AM2 (also called AM2.1) uses the finitevolume dynamical core (Lin 2004).
Robert Pincus (CIRES/NOAA ESRL PSD1) and Patrick Hoffman (NOAA) wrote the DART interface
and are currently using the system for research.
Note: the model code is not distributed with DART.
Status: supported
coamps
The DART interface was originally written and supported by Tim Whitcomb.
The following model description is taken from the
COAMPS overview web page:
The Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS) has been
developed by the Marine Meteorology Division (MMD) of the Naval Research
Laboratory (NRL). The atmospheric components of COAMPS, described below, are
used operationally by the U.S. Navy for shortterm numerical weather
prediction for various regions around the world.
Note: the model code is not distributed with DART.
Status: supported
POP
The Parallel Ocean Program (POP) comes in two variants. Los Alamos National Laboratory
provides POP Version 2.0 which
has been modified to run in the
NCAR Community Climate System Model (CCSM)
framework. As of November 2009, the
CCSMPOP
version is being run. The LANLPOP version is nearly supported 
and some extensions useful for data assimilation in general
have been proposed to LANL, who have agreed in principle to implement the changes.
Fundamentally, the change is an additional restart option in which the first timestep
after an assimilation is a Eulerian timestep (similar to a cold start).
Note: the souce code for POP is not distributed with DART.
Status: actively being developed
Downloadable datasets for DART.
The code distribution was getting 'cluttered' with datasets,
boundary conditions, intial conditions, ... large files that were not
necessarily interesting to all people who downloaded the DART code.
Worse, subversion makes a local hidden copy of the original repository
contents, so the penalty for being large is doubled.
It just made sense to make all the large files available on
as 'asneeded' basis.
To keep the size of the DART distribution down we have a separate
wwwsite to provide some observation sequences, initial conditions,
and general datasets.
It is our intent to populate this site with some 'verification' results,
i.e. assimilations that were known to be 'good' and that should be fairly
reproducible  appropriate to test the DART installation.
Please be patient as I make time to populate this directory.
(yes, 'make', all my 'found' time is taken ...)
Observation sequences can be found at
http://www.image.ucar.edu/pub/DART/Obs_sets.
Verification experiments will be posted to
http://www.image.ucar.edu/pub/DART/VerificationData as soon as
I can get to it. These experiments will consist of initial conditions files
for testing different highorder models like CAM, WRF, POP ...
The loworder models are already distributed with verification data in
their work directories.
Useful bits for CAM can be found at
http://www.image.ucar.edu/pub/DART/CAM.
Useful bits for WRF can be found at
http://www.image.ucar.edu/pub/DART/WRF.
Useful bits for MPAS_ocn can be found at
http://www.image.ucar.edu/pub/DART/MPAS_OCN.
Creating initial conditions for DART
The idea is to generate an ensemble that has sufficient 'spread' to cover the range
of possible solutions. Insufficient spread can (and usually will) lead to poor
assimilations. Think 'filter divergence'.
Generating an ensemble of initial conditions can be done in lots of ways,
only a couple of which will be discussed here.
The first is to generate a single initial condition and let DART perturb it with
noise of a nature you specify to generate as many ensemble members as you like.
The second is to take some existing collection of model states and convert them to
DART initial conditions files and then use the restart_file_tool
to set the proper date in the files. The hard part is then coming up with the
original collection of model state(s).
Adding noise to a single model state
This method works well for some models, and fails miserably for others.
As it stands, DART supplies a routine that can add gaussian noise to every
element of a state vector. This can cause some models to be numerically
unstable. You can supply your own
model_mod:pert_model_state()
if you want a more sophisticated perturbation scheme.
Using a collection of model states.
The important thing to remember is that the highorder models all come
with routines to convert a single model restart file (or the equivalent) to
a DART initial conditions file.
CAM has cam_to_dart,
WRF has wrf_to_dart,
POP has pop_to_dart, etc. DART has the ability to read
a single file that contains initial conditions for all the ensemble members,
or a series of restart files  one for each ensemble member. Simply collect your
ensemble of restart files from your model and convert each of them to a DART
initial conditions file of the form filter_ics.####
where #### represents a 4 digit ensemble member counter.
That is, for a 50member ensemble, they should be named:
filter_ics.0001 ... filter_ics.0050
Frequently, the initial ensemble of restart files is some climatological
collection. For CAM experiments, we usually start with N
different 'January 1' states ... from N different years.
The DART utility program restart_file_tool
is then run on each of these initial conditions files to set a consistent
date for all of the initial conditions.
Experience has shown that it takes less than a week of assimilating
4x/day to achieve a steady ensemble spread. WRF has its own method of
generating an initial ensemble. For that, it is best to go to contact
someone familiar with WRF/DART.
Initial conditions for the loworder models.
In general, there are 'restart files' for the loworder models that already
exist as work/filter_ics. If you need more ensemble members
than are supplied by these files, you can generate your own by adding noise
to a single perfect_ics file. Simply specify
&filter_nml
start_from_restart = .FALSE.,
restart_in_file_name = "perfect_ics",
ens_size = [whatever you want]
'Perfect Model' observation experiments (also known as)
Observation System Simulation Experiment (OSSE)
Once a model is compatible with the DART facility all of the
functionality of DART is available. This includes 'perfect model'
experiments (also called Observing System Simulation Experiments  OSSEs).
Essentially, the model is run forward from a known state and, at predefined times,
an observation forward operator is applied to the model state to harvest
synthetic observations. This model trajectory is known as the 'true state'.
The synthetic observations are then used in an
assimilation experiment. The assimilation performance can then be evaluated
precisely because the true state (of the model) is known. Since the same
forward operator is used to harvest the synthetic observations as well as
during the assimilation, the 'representativeness' error of the assimilation
system is not an issue.
There are a set of Matlab® functions to help explore the assimilation
performance in statespace as well as in observationspace.
An OSSE is explored in depth in our
Lorenz '96 example.
Perfect Model Experiment Overview
There are four fundamental steps to running an OSSE from within DART:
 Create a blueprint of what, where, and
when you want observations. Essentially, define the metadata of the
observations without actually specifying the observation values.
The default filename for the blueprint is obs_seq.in .
For simple cases, this is just running
create_obs_sequence and
create_fixed_network_sequence, more indepth solutions are presented below.
 Harvest the synthetic observations from the
true model state by running
perfect_model_obs
to advance the model from a known initial condition and apply the
forward observation operator based on the observation 'blueprint'.
The observation will have noise added to it based on a draw from
a random normal distribution with the variance specified in the
observation blueprint. The noisefree 'truth' and the noisy 'observation'
are recorded in the output observation sequence file.
The entire timehistory of the true state of the model is recorded in
True_State.nc .
The default filename for the 'observations' is
obs_seq.out .
 Assimilate the synthetic observations with
filter
in the usual way.
The prior/forecast states are preserved in Prior_Diag.nc
and the posterior/analysis states are preserved in
Posterior_Diag.nc .
The default filename for the file with the observations and (optionally) the
ensemble estimates of the observations is obs_seq.final .
 Check to make sure the assimilation was effective!
Ensemble DA is not a black box! YOU must check to make sure you are making
effective use of the information in the observations!
1. Defining the observation metadata  the 'blueprint'.
There are lots of ways to define an observation sequence that DART can use as
input for a perfect model experiment. If you have observations in DART format
already, you can simply use them. If you have observations in one of the
formats already supported by the DART converters (check
DART/observations/observations.html),
convert it to a DART observation sequence. You may need to use the
obs_sequence_tool to combine multiple observation sequence files into observation
sequence files for the perfect model experiment.
Any existing observation values and quality control information will be ignored by
perfect_model_obs  only the time and location information are used.
In fact, any and all existing observation and QC values will be removed.
GENERAL COMMENT ABOUT THE INTERPLAY BETWEEN THE MODEL STOP/START FREQUENCY AND THE
IMPACT ON THE OBSERVATION FREQUENCY: There is usually a very real difference between
the dynamical timestep of the model and when it is safe to stop and restart the model.
The assimilation window is (usually) required to be a multiple of the safe stop/start
frequency. For example, an atmospheric model may have a dynamical timestep of a few
seconds, but may be constrained such that it is only possible to stop/restart every
hour. In this case, the assimilation window is a multiple of 3600 seconds. Trying to get
observations at a finer timescale is not possible, we only have access to the model
state when the model stops.
If you do not have an input observation sequence, it is simple to create one.
 Run create_obs_sequence to
generate the blueprint for the types of observations and
observation error variances for whatever locations are desired.
 Run create_fixed_network_seq to define
the temporal distribution of the desired observations.
Both create_obs_sequence and
create_fixed_network_seq interactively prompt you for the
information they require. This can be quite tedious if you want a spatially dense set
of observations. People have been known to actually write programs to generate the input to
create_obs_sequence and simply pipe or redirect the information
into the program. There are several examples of these in the
models/bgrid_solo directory:
column_rand.f90, id_set_def_stdin.f90, ps_id_stdin.f90,
and ps_rand_local.f90 . Be advised that some observation
types have different input requirements, so a 'one size fits all' program is a
waste of time.
NOTE: only the observation kinds in the input.nml
&obs_kind_nml:assimilate_these_obs_types,evaluate_these_obs
will be available to the create_obs_sequence program.
DEVELOPERS TIP: You can specify 'identity' observations as input
to perfect_model_obs. Identity observations are the model
values AT the exact gridcell location, there is no interpolation at all.
Just a straight tablelookup. This can be useful as you develop your model
interfaces; you can test many of the routines and scripts without having a
working model_interpolate().
More information about creating observation sequence files for OSSE's
is available in
the observation sequence discussion section.
2. Generating the true state and harvesting the observation values 
perfect_model_obs
perfect_model_obs
reads the blueprint and an initial state
and applies the appropriate forward observation operator for each and every observation
in the current 'assimilation window'. If necessary, the model is advanced until the
next set of observations is desired. When it has run out of observations or reached
the stop time defined by the namelist control, the program stops and writes out
a restart file, a diagnostic file, the observation sequence file, and a log file.
This is fundamentally a single deterministic forecast for 'as long as it takes' to
harvest all the observations.
default filename 
format 
contents 
perfect_restart 
ASCII or binary 
The DART model state at the end of the forecast.
If the forecast needs to be lengthened, use this as the input.
The format of the file is controlled by input.nml
&assim_model_nml:write_binary_restart_files
The first record is the valid time of the model.
The rest is the model state at that time. 
True_State.nc 
netCDF 
The DART model state at every assimilation timestep.
This file has but one 'copy'  the truth. Dump the copy metadata
and the time:
ncdump v time,CopyMetaData True_State.nc 
obs_seq.out 
ASCII or binary
DARTspecific linked list 
This file has the observations  the result of the forward
observation operator. This observation sequence file has two 'copies'
of the observation: the noisy 'copy' and the noisefree 'copy'.
The noisy copy is designated as the 'observation', the noisefree copy
is the truth. The observationspace diagnostic program
obs_diag has special options for using the true
copy instead of the observation copy. See the
obs_diag.html for details.

dart_log.out 
ASCII 
The runtime output of perfect_model_obs . 
Each model may define the assimilation window differently,
but conceptually, all the observations plus or minus half the assimilation window are
considered to be simultaneous and a single model state provides the basis for all
those observations. For example: if the blueprint requires temperature observations
every 30 seconds, the initial model time is noon (12:00) and the assimilation window
is 1 hour; all the observations from 11:30 to 12:30 will use the same state as input
for the forward observation operator. The fact that you have a blueprint for observations
every 30 seconds means a lot of those observations may have the same value
(if they are in the same location).
perfect_model_obs uses the input.nml
for its control. A subset of the namelists and variables of particular interest for
perfect_model_obs are summarized here.
Each namelist is fully described by the corresponding module document.
&perfect_model_obs_nml < link to the full namelist description!
...
start_from_restart = .true. usually, but not always
output_restart = .true. sure, why not
init_time_days = 1 negative means use the time in ...
init_time_seconds = 1 the 'restart_in_file_name' file
first_obs_days = 1 negative means start at the first time in ...
first_obs_seconds = 1 the 'obs_seq_in_file_name' file.
last_obs_days = 1 negative means to stop with the last ...
last_obs_seconds = 1 observation in the file.
restart_in_file_name = "perfect_ics"
restart_out_file_name = "perfect_restart"
obs_seq_in_file_name = "obs_seq.in"
obs_seq_out_file_name = "obs_seq.out"
output_interval = 1
async = 0 totally depends on the model
adv_ens_command = "./advance_ens.csh" depends on the model
/
&obs_sequence_nml
write_binary_obs_sequence = .false. .false. will create ASCII  easy to check.
/
&obs_kind_nml
...
assimilate_these_obs_types = 'RADIOSONDE_TEMPERATURE',
... list all the synthetic observation
... types you want
/
&assim_model_nml
...
write_binary_restart_files = .true. your choice
/
&model_nml
...
time_step_days = 0, some models call this 'assimilation_period_days'
time_step_seconds = 3600 some models call this 'assimilation_period_seconds'
use whatever value you want
/
&utilities_nml
...
termlevel = 1 your choice
logfilename = 'dart_log.out' your choice
/
Executing perfect_model_obs
Since perfect_model_obs generally requires advancing the model,
and the model may use MPI or require special ancillary files or forcing files or ...,
it is not possible to provide a single example that will cover all possibilities.
The subroutinecallable models (i.e. the loworder models)
can run perfect_model_obs very simply:
./perfect_model_obs
3. Performing the assimilation experiment  filter
This step is done with the program
filter,
which also uses input.nml for input and runtime control.
A successful assimilation will depend on many things: an approprite initial ensemble,
monitoring and perhaps correcting the ensemble spread, localization, etc. It is
simply not possible to design a onesizefitsall system that will work for all cases.
It is critically important to analyze the results of the assimilation
and explore ways of making the assimilation more effective.
The DART tutorial and
the DART_LAB exercises
are an invaluable resource to learn and understand how to determine the effectiveness of,
and improve upon, an assimilation experiment. The concepts learned with the loworder models
are directly applicable to the most complicated models.
It is important to remember that if filter
'terminates normally', it does not necessarily mean the assimilation was effective!
filter
produces two statespace output diagnostic files
(Prior_Diag.nc and Posterior_Diag.nc)
which contains values of the ensemble mean, ensemble spread, perhaps the
inflation values, and (optionally) ensemble members for the duration of
the experiment. filter also creates an observation
sequence file that contains the input observation information as well as the
prior and posterior ensemble mean estimates of that observation,
the prior and posterior ensemble spread for that observation,
and (optionally), the actual prior and posterior ensemble estimates of that observation.
Rather than replicate the observation metadata for each of these, the single
metadata is shared for all these 'copies' of the observation. See
An overview of the observation sequence
for more detail. filter also produces a runtime
log file that can greatly aid in determining what went wrong if the program
terminates abnormally.
A very short description of some of the most important namelist variables
is presented here. Basically, I am only discussing the settings necessary to
get filter to run. I can guarantee these settings WILL NOT generate the BEST
assimilation. Again, see the module documentation for a full description of
each namelist.
&filter_nml < link to the full namelist description!
async = 0
adv_ens_command = "./advance_model.csh"
ens_size = 40 something ≥ 20, please
start_from_restart = .false. .false. requires reading available input files
output_restart = .true.
obs_sequence_in_name = "obs_seq.out" whatever you called the output from perfect_model_obs
obs_sequence_out_name = "obs_seq.final"
restart_in_file_name = "filter_ics" the file (or base file name) of your ensemble
restart_out_file_name = "filter_restart"
init_time_days = 1 the time in the restart file is correct
init_time_seconds = 1
first_obs_days = 1 same interpretation as with perfect_model_obs
first_obs_seconds = 1
last_obs_days = 1 same interpretation as with perfect_model_obs
last_obs_seconds = 1
num_output_state_members = 10 # of FULL DART model states to put in statespace output files
num_output_obs_members = 40 # of ensemble member 'copies' of observation to save
output_interval = 1
num_groups = 1
input_qc_threshold = 4.0
outlier_threshold = 3.0 Observation rejection criterion!
output_forward_op_errors = .false.
output_timestamps = .false.
output_inflation = .true.
inf_flavor = 0, 0 0 is 'do not inflate'
inf_start_from_restart = .false., .false.
inf_output_restart = .false., .false.
inf_deterministic = .true., .true.
inf_in_file_name = 'not_initialized', 'not_initialized'
inf_out_file_name = 'not_initialized', 'not_initialized'
inf_diag_file_name = 'not_initialized', 'not_initialized'
inf_initial = 1.0, 1.0
inf_sd_initial = 0.6, 0.0
inf_damping = 0.9, 0.0
inf_lower_bound = 1.0, 1.0
inf_upper_bound = 1000000.0, 1000000.0
inf_sd_lower_bound = 0.6, 0.0
/
&ensemble_manager_nml
single_restart_file_in = .false. .false. means each enemble member is in a separate file
single_restart_file_out = .false.
perturbation_amplitude = 0.2 not used if 'single_restart_file_in' is .false.
/
&assim_tools_nml
filter_kind = 1 1 is EAKF, 2 is EnKF ...
cutoff = 0.2 this is your localization  units depend on type of 'location_mod'
/
&obs_kind_nml
assimilate_these_obs_types = 'RAW_STATE_VARIABLE' Again, use a list ...
/
&model_nml
assimilation_perior_days = 0 the assimilation interval is up to you
assimilation_perior_seconds = 3600
/
num_output_state_members are '.true.' so
the state vector is output at every time for which there are
observations (once a day here).
Posterior_Diag.nc and Prior_Diag.nc
then contain values for 20 ensemble members once a day.
Once the namelist is set, execute filter to
integrate the ensemble forward for 24,000 steps with the final ensemble
state written to the filter_restart.
Copy the perfect_model_obs restart file
perfect_restart (the `true state') to
perfect_ics, and the
filter restart file
filter_restart to
filter_ics so that future assimilation experiments
can be initialized from these spunup states.
mpirun ./filter OR
mpirun.lsf ./filter OR
./filteru OR
however YOU run filter on your system!
perfect_model_obs
4. ASSESS THE PERFORMANCE!
All the concepts of spread, rmse, rank histograms that were taught
in the DART tutorial and in DART_LAB should be applied now.
Try the techniques described in the
Did my experiment work? section.
The 'big three' statespace diagnostics are repeated here because they are so important.
The first two require the True_State.nc .
plot_bins.m 
plots the rank histograms for a set of state variables.
This requires you to have all or most of the ensemble members available in the
Prior_Diag.nc or Posterior_Diag.nc files.

plot_total_err.m 
plots the evolution of the error (unnormalized) and ensemble
spread of all state variables. 
plot_ens_mean_time_series.m 
plots the evolution of a set of state variables  just the ensemble mean
(and Truth, if available).
plot_ens_time_series.m is actually
a better choice if you can afford to write all/most of the ensemble members
to the Prior_Diag.nc and Posterior_Diag.nc
files. 
DON'T FORGET ABOUT THE OBSERVATIONSPACE DIAGNOSTICS!
An overview of the DART 'preprocess' program
First and foremost, check out the
DART/preprocess/preprocess.html document
for detailed information.
The preprocess program actually builds source code to
be used by all the remaining modules. It is imperative to
actually run preprocess before building
any executables. This is how the same code can assimilate state
vector 'observations' for the Lorenz_63 model and real radar reflectivities
for WRF without needing to specify a set of radar operators for the
Lorenz_63 model!
preprocess combines multiple 'obs_def' modules into one
obs_def_mod.f90 that is then used by the rest of DART.
Additionally, a new obs_kind_mod.f90 is built that will
provide support for associating the specific observation TYPES
with corresponding (generic) observation KINDS.
More on that later.
The list of source codes is contained in the
&preprocess_nml namelist and they ultimately
determine what observations and operators are supported.
If you want to add another 'obs_def' module, you must rerun
preprocess and recompile the rest of your project.
preprocess is designed to abort if
the files it is supposed to build already exist. For this reason, it is necessary
to remove a couple files (if they exist) before you run the preprocessor.
It is just a good habit to develop.
\rm f ../../../obs_def/obs_def_mod.f90
\rm f ../../../obs_kind/obs_kind_mod.f90
./preprocess
ls l ../../../obs_def/obs_def_mod.f90
ls l ../../../obs_kind/obs_kind_mod.f90
For example, with a namelist that looks like:
&preprocess_nml
input_obs_kind_mod_file = '../../../obs_kind/DEFAULT_obs_kind_mod.F90',
output_obs_kind_mod_file = '../../../obs_kind/obs_kind_mod.f90',
input_obs_def_mod_file = '../../../obs_def/DEFAULT_obs_def_mod.F90',
output_obs_def_mod_file = '../../../obs_def/obs_def_mod.f90',
input_files = '../../../obs_def/obs_def_gps_mod.f90',
'../../../obs_def/obs_def_QuikSCAT_mod.f90',
'../../../obs_def/obs_def_GWD_mod.f90',
'../../../obs_def/obs_def_altimeter_mod.f90',
'../../../obs_def/obs_def_reanalysis_bufr_mod.f90'
/
preprocess will combine
DEFAULT_obs_def_mod.F90,
obs_def_gps_mod.f90,
obs_def_QuikSCAT_mod.f90,
obs_def_GWD_mod.f90,
obs_def_altimeter_mod.f90, and
obs_def_reanalysis_bufr_mod.f90, into
obs_def_mod.f90 
which can be used by the rest of the project.
Building and Running 'preprocess'
preprocess is an executable, so it should come as
no surprise that it must be built in the normal DART fashion. The
DART/mkmf/mkmf.template must be correct for your
environment, and the input.nml must have your desired
preprocess_nml set correctly. Given that ...
csh mkmf_preprocess
make
./preprocess
will build and run preprocess.
The first command generates an appropriate Makefile and
the input.nml.preprocess_default file.
The second command results in the compilation of a series of
Fortran90 modules which ultimately produces an executable file:
preprocess.
The third command actually runs preprocess  which builds the new
obs_kind_mod.f90 and
obs_def_mod.f90 source code files. The rest of DART
may now be built.
A little background for 'preprocess'
IMPORTANT:
Since each 'observation kind' may require different amounts of metadata to
be read or written; any routine to read or write an observation sequence
must be compiled with support for those particular observations.
The supported observations are listed in the
input.nml&obs_kind_nml block.
This is the whole point of the 'preprocess' process ...
An overview of the observation sequence
Observation sequences are complicated, there's just no better
way to describe it. Trying to automatically accomodate a myriad
of observation file formats, structure, and metadata is simply not
an easy task. For this reason, DART has its own format for observations
and a set of programs to convert observations from their original formats
to DART's format. There are definitely some things to know ...
An obs_seq.in file actually contains
no observation quantities. It may be best thought of as a
perfectlylaidout notebook  just waiting for an observer to fill
in the actual observation quantities. All the rows and columns are
ready, labelled, and repeated for every observation time and platform.
This file is generally the start of a "perfect model"
experiment. Essentially, one instance of the model is run through
perfect_model_obs  which applies the
appropriate forward operators to the model state and 'writes them down'
in our notebook. The completed notebook is renamed
obs_seq.out.
An obs_seq.out file contains a linked list
of observations  potentially (and usually) observations from different
platforms and of different quantities  each with their own error
characteristics and metadata. These files arise from running
perfect_model_obs OR from any number
of converter programs. The creation of observation sequences from real
observations is not automatic and an email to the DART team asking
for advice for your specific types of observations is perfectly within
reason.
There is something called an obs_seq.final
file  which contains everything in the obs_seq.out
file as well as a few additional 'copies' of the observation.
Remember, DART is an ensemble algorithm. Each ensemble member must
compute its own estimate of the observation for the algorithm.
The obs_seq.final file may
contain each of these estimates (namelist controlled). Minimally, the
mean and spread of the ensemble estimates is recorded in the
obs_seq.final file. The best method of determining
the performance of your 'real world' experiment is to compare in
observationspace since we can never know the
model state that perfectly represents the real world.
IMPORTANT:
Since each 'observation kind' may require different amounts of metadata to
be read or written; any routine to read or write an observation sequence
must be compiled with support for those particular observations.
The supported observations are listed in the
input.nml&obs_kind_nml block.
This is the whole point of the 'preprocess' process ...
observation sequence file structure
explanation  obs_seq.out  obs_seq.final 
There are extensible parts of the observation sequence
file; for example, the number of observation kinds contained in the
file, whether the locations have 1 or more components, how
many quality control values are available for each
observation, where those quality control values come from,
how many 'copies' of each observation there
are ... et cetera. The images to the right are links to
fullsize images. They are from entirely separate
experiments. They are just meant to show the flexibility
of the file format. 


Creating observations and sequences.
It is strongly encouraged that you use a single observation to test
a new model implementation.
Experience has shown that starting 'simple' is the fastest way to good results.
Starting with a single observation will exercise a sufficient portion
of the procedure and provide insight into where to spend more effort. Starting with
a single synthetic observation will allow you to focus on the more interesting
parts of the DART scheme without getting bogged down in the world of
observation data formats.
Creating a synthetic observation sequence.
There are several steps to create an observation sequence file, which
follows directly from the modular nature of the DART programming
philosophy.
 Decide what observations you want to investigate and edit the
input.nml&obs_kind_nml block.
 Build and run preprocess to create code that supports
the observations you want.
 Build and run create_obs_sequence to define the specifics about the observation you want.
 Build and run create_fixed_network_sequence to replicate those specifics through time.
 Build and run perfect_model_obs to create an observation consistent with the model state and specified error distribution at the requested times and locations.
Example: generating observations for the Lorenz '63 model.
1) There are no 'real' observations for the Lorenz '63 model, so the appropriate
namelist settings are:
&obs_kind_nml
assimilate_these_obs_types = 'RAW_STATE_VARIABLE' /
&preprocess_nml
input_obs_def_mod_file = '../../../obs_def/DEFAULT_obs_def_mod.F90',
output_obs_def_mod_file = '../../../obs_def/obs_def_mod.f90',
input_obs_kind_mod_file = '../../../obs_kind/DEFAULT_obs_kind_mod.F90',
output_obs_kind_mod_file = '../../../obs_kind/obs_kind_mod.f90',
input_files = '../../../obs_def/obs_def_1d_state_mod.f90' /
2) Run preprocess in the normal fashion.
3) create_obs_sequence
creates an observation set definition
(typically named set_def.out),
the timeindependent part of an observation sequence. It may help
to think of it as trying to define what sorts of observations will
be taken at one 'reading' ... you walk out to the box and take
temperature, humidity, and wind observations all at the same time
and place, for example. You can think of it as one page in an observer's notebook,
and only contains the location, type,
and observational error characteristics
(normally just the diagonal observational error variance)
for a related set of observations. There are no actual observation values,
nor are there any times associated with the definition. The program
is interactive and queries the user for the information it needs.
Begin by creating a minimal observation set definition
in which each of the 3 state variables of L63 is directly observed
with an observational error variance of 1.0 for each observation.
To do this, use the following input sequence
(the text including and after # is a comment and does
not need to be entered):
The following is a screenshot
(much of the verbose logging has been left off for clarity),
the user input looks like this.
[unixprompt]$ ./create_obs_sequence
Starting program create_obs_sequence
Initializing the utilities module.
Trying to log to unit 10
Trying to open file dart_log.out
Registering module :
$url: http://squish/DART/trunk/utilities/utilities_mod.f90 $
{ ... }
Input upper bound on number of observations in sequence
4
Input number of copies of data (0 for just a definition)
0
Input number of quality control values per field (0 or greater)
0
input a 1 if there are no more obs
0
Registering module :
$url: http://squish/DART/trunk/obs_def/DEFAULT_obs_def_mod.F90 $
{ ... }
initialize_module obs_kind_nml values are
 ASSIMILATE_THESE_OBS_TYPES 
RAW_STATE_VARIABLE
 EVALUATE_THESE_OBS_TYPES 

Input 1 * state variable index for identity observations
OR input the name of the observation kind from table below:
OR input the integer index, BUT see documentation...
1 RAW_STATE_VARIABLE
1
input time in days and seconds
0 0
Input error variance for this observation definition
1.0
input a 1 if there are no more obs
0
{ this gets repeated ... until you tell it to stop ... }
input a 1 if there are no more obs
1
Input filename for sequence ( set_def.out usually works well)
set_def.out
write_obs_seq opening formatted file set_def.out
write_obs_seq closed file set_def.out
Rest assured that if you requested to assimilate more realistic
observation types, you will be queried for appropriate information by
create_obs_sequence.
Below is a table that explains all of the input you should need to supply
for observations of the L63 model state.
4 
# upper bound on num of observations in sequence 
0 
# number of copies of data (0 for just a definition) 
0 
# number of quality control values per field (0 or greater) 
0 
# 1 to exit/end observation definitions 

1 
# observe state variable 1 
0 0 
# time  days, seconds 
1.0 
# observational variance 
0 
# 1 to exit/end observation definitions 

2 
# observe state variable 2 
0 0 
# time  days, seconds 
1.0 
# observational variance 
0 
# 1 to exit/end observation definitions 

3 
# observe state variable 3 
0 0 
# time  days, seconds 
1.0 
# observational variance 
1 
# 1 to exit/end observation definitions 

set_def.out 
# Output file name 
4) create_fixed_network_sequence takes
the observation set definition and repeats it in time, essentially
making multiple pages in our notebook.
Again, the program is interactive and queries the user for information.
You should be able to simply follow the prompts.
The table below represents the input needed for the L63 example:
set_def.out 
# Input observation set definition file 
1 
# Regular spaced observation interval in time 
1000 
# 1000 observation times 
0, 43200 
# First observation after 12 hours (0 days, 12 * 3600 seconds) 
0, 43200 
# Observations every 12 hours 
obs_seq.in 
# Output file for observation sequence definition 
5) perfect_model_obs advances the model from the state
defined by the initial conditions file specified in the
input.nml and 'applies the forward operator' to harvest
observations to fill in the observation sequence specified in
obs_seq.in. The observation sequence finally has
values for the observations and is saved in a file generally named
obs_seq.out.
perfect_model_obs is namelistdriven, as opposed to
the previous two (whose input is a lot harder to specify in a namelist).
Take a look at (and modify if you like) the
input.nml&perfect_model_obs_nml
section of the namelist.
The End. Not only should you have an observation sequence file
(usually obs_seq.out) , you also have a file
containing the exact evolution of the model consistent with
those observations  the True_State.nc.
Real Observations  Converting to a DARTcompatible format.
Real observations come in a mindboggling diversity of formats.
We have converters for some formats in the
DART/observations directory. Many of the
formats require their own libraries (like HDF), and require intimate
knowledge of the data format to extract the portions required for the
DART observation sequence file.
Please feel free to browse the converters and their companion documentation.
Feel free to donate converters for formats we don't already support!
We like that kind of stuff.
The DART framework enforces a clean separation between observations
and the models used for assimilation. The same observations can be
used in any model which understands how to generate a value for the
requested type of observation from the models' statespace values
(i.e. the forward observation operator must exist  DART provides
many for the most common state variables).
In many cases, the original datasets are in a standard scientific
format like netCDF, HDF, or BUFR, and library routines for those
formats can be used to read in the original observation data.
The DART software distribution includes Fortran subroutines and
functions to help create a sequence of observations in memory,
and then a call to the DART observation sequence write routine
will create an entire obs_seq
file in the correct format.
In many cases, a single, selfcontained program can convert
directly from the observation location, time, value, and error
into the DART format. In other cases, especially those linking with
a complicated external library (e.g. BUFR), there is a twostep process
with two programs and an ASCII intermediate file. We are currently
leaning towards singlestep conversions but either
approach can be used for new programs.
The DART system comes with several types of location modules for
computing distances appropriately. The two most commonly used
are for data in a 1D system and for data in a 3D spherical coordinate
system. All the programs in the DART/observations
directory assume the
location/threed_sphere/location_mod.f90
3D sphere location module is being used.
With the myriad of observation file formats, HDF, Grib,
BUFR, netCDF, ... we simply have not had the time nor need to
support all of them.
The converters are a work in progress. There are currently about
10 other observation sources and types which we are in the process
of collecting information and conversion programs for and which will
eventually be added to this directory. In the meantime, if you have
converters for data or interest in something that is not in the
repository, please email the DART group.
Your best bet is to contact our group at dart@ucar.edu with
a specific request and we can steer you to the most similar process.
Manipulating observation sequences.
First and foremost, check out the
DART/obs_sequence/obs_sequence_tool.html
document for detailed information and examples.
obs_sequence_tool is the primary tool for manipulating
observation sequence files. Observations sequence files are linked lists of
observations organized by time. That is to say, the observations may appear
in any order in the file, but traversing the linked list will result in
observations ordered by time. There are tools for querying the linked list to
extract the 'keys' to the list items for a particular date range, for example.
obs_sequence_tool
can be used to combine observation sequences, convert from ASCII to
binary or viceversa, extract a subset of observations, etc.
For testing, it is terribly useful to extract a small number of
observations (like ONE) from an existing observation sequence file.
The difference between observation TYPE and observation KIND.
Broadly speaking, observation TYPES are specific instances of a
generic observation KIND. The distinction is useful for several reasons,
not the least of which is to evaluate observation platforms.
Zonal wind observations from QuikSCAT vs. radiosondes, for example.
They are both observations of zonal winds (what we call KIND_U_WIND_COMPONENT),
but they are different observation TYPES; QKSWND_U_WIND_COMPONENT,
and RADIOSONDE_U_WIND_COMPONENT, respectively.
The forward observation operators are implemented based on observation KIND.
When requested, the model generates a KIND_U_WIND_COMPONENT,
it doesn't need to know that it will be compared to a QuikSCAT value
or a radiosonde value.
However, it is usually scientifically very interesting to
be able to compare the assimilations one TYPE of observation vs. another.
One observation sequence file can have lots of types of observations; DART
has the capability to assimilate (or evaluate) any combination of
observation types without getting bogged down in dataset management. The
same observation sequence can be used for experiments that include/exclude
certain observation types  ensuring that you are performing the experiment
you THINK you are performing ...
Adding support for a new observation TYPE.
DART/obs_def/obs_def_mod.html
is the source for detailed information.
The Data Assimilation Research Testbed  DART Tutorial
DART comes with an extensive set of tutorial materials,
working models of several different levels of complexity, and data to
be assimilated. It has been used in several multiday workshops and
can be used as the basis to teach a section on Data Assimilation.
Download the DART software distribution and look in the
tutorial
subdirectory for the pdf and framemaker
source for each of the 22 tutorial sections.
The most recent versions of the tutorial are always provided below.
Browsing the tutorial is worth the effort.
Taking the tutorial is FAR better!
 Section 1 [pdf] Filtering For a One Variable System.
 Section 2 [pdf] The DART Directory Tree.
 Section 3 [pdf] DART Runtime Control and Documentation.
 Section 4 [pdf] How should observations of a state variable impact an unobserved state variable? Multivariate assimilation.
 Section 5 [pdf] Comprehensive Filtering Theory: NonIdentity Observations and the Joint Phase Space.
 Section 6 [pdf] Other Updates for An Observed Variable.
 Section 7 [pdf] Some Additional LowOrder Models.
 Section 8 [pdf] Dealing with Sampling Error.
 Section 9 [pdf] More on Dealing with Error; Inflation.
 Section 10 [pdf] Regression and Nonlinear Effects.
 Section 11 [pdf] Creating DART Executables.
 Section 12 [pdf] Adaptive Inflation.
 Section 13 [pdf] Hierarchical Group Filters and Localization.
 Section 14 [pdf] DART Observation Quality Control.
 Section 15 [pdf] DART Experiments: Control and Design.
 Section 16 [pdf] Diagnostic Output.
 Section 17 [pdf] Creating Observation Sequences.
 Section 18 [pdf] Lost in Phase Space: The Challenge of Not Knowing the Truth.
 Section 19 [pdf] DARTCompliant Models and Making Models Compliant.
 Section 20 [pdf] Model Parameter Estimation.
 Section 21 [pdf] Observation Types and Observing System Design.
 Section 22 [pdf] Parallel Algorithm Implementation.
 Carbon Tutorial [pdf] A Simple 1D Advection Model.
Adding a model to DART  Overview
DART is designed to work with many models without
modifications to the DART routines or the model source code.
DART can 'wrap around' your model in two ways.
One can be used if your model can be called as a subroutine, the
other is for models that are separate executables. Either way,
there are some steps that are common to both paths.
Please be aware that several of the highorder models (CAM and WRF, in
particular) have been used for years and their scripts have incorporated
failsafe procedures and restart capabilities that have proven to be useful
but make the scripts complex  more complex than need be for the initial
attempts. Truly, some of the complexity is no longer required for available
platforms. Then again, we're not running one instance of a highly complicated
computer model, we're running N of them.
The basic steps to include your model in DART
 Copy the template directory and files to your own DART model
directory.
 Modify the model_mod.f90 file to return
specifics about your model. This module MUST contain all the
required interfaces (no surprise) but it can also contain many
more interfaces as is convenient.
 [optional step] Modify the Matlab® routines to know about the
specifics of the netCDF files produces by your model
(sensible defaults, for the most part.)
If your model is not subroutinecallable, there is extra work
to be done. There are several examples to raid, but it helps to
know which existing model has a strategy that is most similar to
yours. More on that later.
 Modify shell_scripts/advance_model.csh to:
collect all the input files needed to advance the model into a
clean, temporary directory, convert the state vector file into
input to your model, run your model, and convert your model
output to the expected format for another assimilation by DART.
We have examples  some that use the following support routines.
 Create a routine or set of routines to take a
DART array and create input files for your model.
This is frequently done by a program called
dart_to_model.f90, but you can do
it any way you like. It is strongly suggested that you
use the DART read mechanism for the restart file 
namely assim_model_mod.f90:aread_state_restart()
 Modify the input to your model communicating the runtime
settings necessary to integrate your model from one time
to another arbitrary time in the future. It may be
convenient to do this in dart_to_model.f90.
 Run the model (you may need to watch the MPI syntax)
 Create a routine or set of routines to take your model
output files and create a DART restart file. This is
frequently done by a program called
model_to_dart.f90.
It is strongly suggested that you use the DART write
mechanism for the restart file  namely
assim_model_mod.f90:awrite_state_restart()
If a single instance of your model needs to advance using all the MPI
tasks, there is one more script that needs to work 
run_filter.csh.
 Modify shell_scripts/run_filter.csh to:
do everything under the sun and then
some
Test ...
Generally, it is a good strategy to use DART to create a synthetic
observation sequence with ONE observation location  and ONE
observation type  for several assimilation periods. With that,
it is possible to run perfect_model_obs and
then filter without having to debug too much
stuff at once. A separate document will address how to test your model
with DART.
Programming style
#1 Don't shoot the messenger. We have a lot of experience trying to write
portable/reproducible code and offer these suggestions. All of these
suggestions are for the standalone DART components. We are not asking you
to rewrite your model. If your model is a separate executable, leaving
it untouched is fine.
Writing portable code for the DART components will allow us to
include your model in the nightly builds and reduces the risk of us
making changes that adversely affect the integration with your model.
There are some routines
that have to play with the core DART routines, these
are the ones we are asking you to write using these few simple guidelines.
 Use explicit typing, do not throw the 'autopromote' flag on
your compiler.
 Use the intent() attribute.
 Use the use, xxx_mod, only : bob, sally
statements for routines from other modules. This really helps
us track down things and ensures you're using what you think
you're using.
 Use Fortran namelists for I/O if possible.
 Check out the existing parameters/routines in
common/types_mod.f90,
utilites/utilities_mod.f90, and
time_manager/time_manager_mod.f90.
You are free to use these and are encouraged to do so.
No point reinventing the wheel and these routines have been
tested extensively.
Hopefully, you have no idea how difficult it is to build each model
with 'unique' compile options on N different platforms.
Fortran90 provides a nice mechanism to specify the type of variable,
please do not use vendorspecific extensions.
(To globally autopromote 32bit reals to 64bit reals, for example.
That is a horrible thing to do, since vendors are not consistent about
what happens to explicitlytyped variables. Trust me. They lie.
It also defeats the generic procedure interfaces that are designed
to use a single interface as a frontend to multiple 'typespecific'
routines.) Compilers do abide by the standard, however,
so DART code looks like:
character(len=8) :: crdate
integer, dimension(8) :: values
...
real(r4) :: a,b
real(r8) :: bob
integer :: istatus, itype
...
real(r8), intent(in) :: x(:)
type(location_type), intent(in) :: location
integer, intent(in) :: itype
integer, intent(out) :: istatus
real(r8), intent(out) :: obs_val
depending on the use. The r4 and
r8 types are explicitly defined in
DART/common/types_mod.f90 to accurately represent
what we have come to expect from 32bit and 64bit floating point real
variables, respectively. If you like, you can redefine
r8 to be the same as r4
to shrink your memory requirement. The people who run with WRF
frequently do this. Do not redefine the digits12
parameter, that one must provide 64bit precision, and is used in
precious few places.
Adding a model to DART  Specifics
If your model is a separate executable, there is some flexibility
to provide the required interfaces and it would be wise to look at
the heavily commented template script
DART/models/templates/shell_scripts/advance_model.csh
and then a few higherorder models to see how they do it.
Become familiar with DART/doc/html/mpi_intro.html (DART's use of MPI),
DART/doc/html/filter_async_mod
the filter
namelist parameter async in
filter.html.
1. Copying the template directory
A little explanation/motivation is warranted. If the model uses the
standard layout, it is much easier to include the model in the nightly
builds and testing. For this reason alone, please try to use the
recommended directory layout. Simply looking at the
DART/models directory should give you a pretty
good idea of how things should be laid out. Copy the
template directory and its contents.
It is best to remove the (hidden) subversion files to keep the
directory 'clean'. The point of copying this directory is to get
a model_mod.f90 that works asis and you can
modify/debug the routines one at a time.
~/DART/models % cp r template mymodel
~/DART/models % find mymodel name .svn print
mymodel/.svn
mymodel/shell_scripts/.svn
mymodel/work/.svn
~/DART/models % rm rf
`find mymodel name .svn print`
~/DART/models % find mymodel name .svn print
~/DART/models %
The destination directory (your model directory) should be in the
DART/models directory to keep life simple.
Moving them around will cause problems for the
work/mkmf_xxxxx configuration files.
Each model directory should have a work
and shell_scripts directories, and may
have a matlab directory,
a src directory,
or anything else you may find convenient.
Now, you must change all the work/path_names_xxx
file contents to reflect the location of your model_mod.f90.
2. model_mod.f90
We have templates, examples, and a document describing the required interfaces
in the DART code tree 
DART/models/model_mod.html.
Every(?) uservisible DART program/module has a matching piece of
documentation that is distributed along with the code.
The DART code tree always has the most current documentation.
Check out time_manager_mod.f90 and
utilities_mod.f90 for generalpurpose routines ...
Use Fortran namelists for I/O if possible.
Modify the model_mod.f90 file to return specifics
about your model. This module MUST contain all the required interfaces
(no surprise) but it can also contain many more interfaces as is convenient.
This module should be written with the understanding that print statements
and error terminations will be executed by multiple processors/tasks. To
restrict print statements to be written once (by the master task), it is
necessary to preface the print as in this example:
if (do_output()) write(*,*)'model_mod:namelist
cal_NML',startDate_1,startDate_2
Required Interfaces in model_mod.f90
No matter the complexity of the model, the DART software requires
a few interface routines in a modelspecific Fortran90 module
model_mod.f90 file. The
models/template/model_mod.f90 file
has extended comment blocks at the heads of each of these routines
that go into much more detail for what is to be provided.
You cannot change the types or number of required arguments
to any of the required interface routines. You can add
optional arguments, but you cannot go back throught the DART tree to
change the gazillion calls to the mandatory routines. It is absolutely
appropriate to look at existing models to get ideas about how to
implement the interfaces. Finding a model implementation that is
functionally close to yours always helps.
As of December 2008, the table of the mandatory interfaces and programming
degreeofdifficulty is:
subroutine callable 
separate executable 
routine 
description 
easy  easy 
get_model_size 
This function returns the size of all the model variables
(prognostic or diagnosed or ...) that are packed into the 1D
DART state vector. That is, it returns the length of the DART
state vector as a single scalar integer. 
depends  trivial 
adv_1step 
For subroutinecallable models, this routine is the one to
actually advance the model 1 timestep
(see models/bgrid_solo/model_mod.f90 for
an example). For nonsubroutinecallable
models, this is a NULL interface. Easy. 
depends  depends 
get_state_meta_data 
This routine takes as input an integer into the DART state vector
and returns the associated location and (optionally)
variable type from obs_kind/obs_kind_mod.f90.
(See models/*/model_mod.f90 for
examples.) This generally requires knowledge of how the model
state vector is packed into the DART array, so it can be as
complicated as the packing.

depends  hard 
model_interpolate 
This is one of the more difficult routines. Given a DART state
vector, a location, and a desired generic 'kind' (like
KIND_SURFACE_PRESSURE, KIND_TEMPERATURE, KIND_SPECIFIC_HUMIDITY,
KIND_PRESSURE, ... ); return the desired scalar quantity and set
the return status accordingly. This is what enables the model to use
observationspecific 'forward operators' that are part of the
common DART code. 
easy  easy 
get_model_time_step 
This routine returns the smallest increment in time (in seconds) that
the model is capable of advancing the state in a given implementation.
For example, the dynamical timestep of a model is 20 minutes, but
there are reasons you don't want to (or cannot) restart at this
interval and would like to restart AT MOST every 6 hours.
For this case, get_model_time_step
should return 21600, ie 6*60*60.
This is also interpreted as the nominal assimilation period.
This interface is required for all applications. 
easy  easy 
end_model 
Performs any shutdown and cleanup needed. Good form would dictate
that you should deallocate any storage allocated when you
instantiated the model (from
static_init_model, for example). 
depends  depends 
static_init_model 
Called to do onetime initialization of the model. This generally
includes setting the grid information, calendar, etc. 
trivial  trivial 
init_time 
Returns a time that is somehow appropriate for starting up a
long integration of the model IFF the namelist
parameter start_from_restart = .false.
for the program perfect_model_obs.
If this option is not to be used in
perfect_model_obs,
this can be a NULL interface. 
easy  easy 
init_conditions 
Companion interface to init_time.
Returns a model state vector that is somehow appropriate for
starting up a long integration of the model. Only needed
IFF the namelist parameter
start_from_restart = .false.
for the program perfect_model_obs.

trivialdifficult 
trivialdifficult 
nc_write_model_atts 
This routine is used to write the modelspecific attributes to
the netCDF files containing the prior and posterior states of
the assimilation. The subroutine in the
models/template/model_mod.f90 WILL WORK for
new models but does not know anything about prognostic variables
or geometry or ... Still, it is enough to get started without
doing anything. More meaningful coordinate variables etc. are
needed to supplant the default template. This can be as
complicated as you like  see existing models for examples.

trivialdifficult 
trivialdifficult 
nc_write_model_vars 
This routine is responsible for writing the DART state vector
or the prognostic model variables to the output netCDF files.
If the namelist parameter
output_state_vector == .false. this
routine is responsible for partitioning the DART state vector
into the appropriate netCDF pieces (i.e. the prognostic model
variables). The default routine will simply blast out the entire
DART state vector into a netCDF variable called 'state'. 
depends  trivialdifficult 
pert_model_state 
This routine is used to generate initial ensembles. This may
be a NULL interface if you can tolerate the default perturbation
strategy of adding noise to every state element or if you
generate your own ensembles outside the DART framework. There are
other ways of generating ensembles ... climatological
distributions, bred singular vectors, voodoo ... 
trivial  trivial 
get_close_maxdist_init 
This routine performs the initialization for the tablelookup
routines that accelerate the distance calculations. This routine
is closely tied to the type of location module used by the model
and is frequently (universally?) simply a 'passthrough' routine
to a routine of the same name in the location module. There is
generally no coding that needs to be done, but the interface must
exist in model_mod.f90 
trivial  trivial 
get_close_obs_init 
This routine performs the initialization for the get_close accelerator that depends on the
particular observation. Again, this is generally a 'passthrough'
routine to a routine of the same name in the location module. 
trivial  trivial 
get_close_obs 
This is the routine that takes a single location and a list of
other locations, returns the indices of all the locations close to
the single one along with the number of these and the distances for
the close ones. Again, this is generally a 'passthrough'
routine to a routine of the same name in the location module. 
easy  easy 
ens_mean_for_model 
This routine simply stores a copy of the ensemble mean of the
state vector within the model_mod.
The ensemble mean may be needed for some calculations (like
converting model sigma levels to the units of the observation 
pressure levels, for example). 
3. providing Matlab® support
Since this is an optional step, it will be covered in a separate document.
If your model is subroutinecallable  you're done!
4. advance_model.csh
The Big Picture for models advanced as separate executables.
The normal sequence of events is that DART reads in its own restart file
(do not worry about where this comes from right now) and eventually
determines it needs to advance the model. DART needs to be able to
take its internal representation of each model state vector, the
valid time of that state, and the amount of time to advance the state 
and communicate that to the model. When the model has advanced the state
to the requested time, the output must be ingested by DART and the cycle
begins again. DART is entirely responsible for reading the observations
and there are several programs for creating and manipulating the observation
sequence files.
There are a couple of ways to exploit parallel architectures with DART,
and these have an immediate bearing on the design of the script(s) that
control how the model instances (each model copy) are advanced.
Perhaps the conceptually simplest method is when each model instance is
advanced by a single processor element. DART calls this
async = 2. It is generally efficient to
relate the ensemble size to the number of processors being used.
The alternative is to advance every model instance one after another
using all available processors for each instance of the model.
DART calls this async = 4,
and requires an additional script.
For portability reasons, DART uses the same processor set
for both the assimilation and the model advances. For example, if you
advance the model with 96 processors, all 96 processors will be employed to
assimilate.
advance_model.csh is invoked in one of two ways:
1) if async = 2 then
filter uses a
system() call, or
2) if async = 4 then
run_filter.csh makes the call. Either way
there are three arguments.
 the process number of the caller  could be the master task ID (zero) or
(especially if async = 2) a process
id that gets related to the copy. When multiple copies are being
advanced simultaneously, each of the advances happens in its own
runtime directory.
 the number of state copies belonging to that process
 the name of the (ASCII) filter_control_file for that process.
The filter_control file contains the following information
(one per line): the ensemble member, the name of the input file
(containing the DART state vector), and the name of the output file
from the model containing the new DART state vector. For example,
1
assim_model_state_ic.0001
assim_model_state_ud.0001
2
assim_model_state_ic.0002
assim_model_state_ud.0002
...
async = 2 ... advancing many copies at the same time
Modify shell_scripts/advance_model.csh to:
 Collect all the input files needed to advance the model into a
clean, temporary directory.
 Determine how many tasks you have, and how many ensemble members
you have. Determine how many 'batches' of ensemble members must
be done to advance all of them. With 20 tasks and 80 ensemble members,
you will need to loop 4 times, for example.
clean, temporary directory

and loop over the following three steps  each loop
advances one ensemble member
 convert the DART state vector file into input for your model,
 run your model, and
 convert your model output to the file with the expected format for another
assimilation by DART.
During this initial phase, it may be useful to _leave_ the temporary
directory intact until you verify everything is as it should be.
async = 4 ... advancing each copy one at a time
In addition to modifying shell_scripts/advance_model.csh as
described above, you must also modify shell_scripts/run_filter.csh
in the following way:
THIS PART NEEDS TO BE FILLED IN
5. Converting DART output to input for your model.
After DART has assimilated the observations and created new (posterior)
states, it is necessary to reformat those posteriors into input for the
model. Fundamentally, you are unpacking the DART state vector and putting
the pieces back into whatever portion of your model initial conditions file
is appropriate. Frequently this is done by a program called
dart_to_model.f90.
Put another way; Create a routine or set of routines to modify the input
to your model communicating the runtime settings necessary to integrate
your model from one time to another arbitrary time in the
future. Frequently this is done by
dart_to_model.f90. The DART array has
a header that contains the 'advanceto' time as
well as the 'valid' time of the DART array. The
times are encoded in DART's time representation.
Interpretation of these times by your model will be necessary
and you will need to feed these times to your model in some
automated fashion.
You will also need to create/modify a
mkmf_dart_to_model and
path_names_dart_to_model specific to your model.
6. Modify the start/stop control of your model run.
Create a routine or set of routines to modify the input to
your model communicating the runtime settings necessary
to integrate your model from one time to another arbitrary
time in the future. These routines are called in the
advance_model.csh script. Every model is
controlled differently, so writing detailed descriptions here is
pointless.
7. Convert your model output to DART input.
After your model has advanced its states to the target time, it is
necessary to convey the new state information to DART.
The preferred name for this is model_to_dart.f90.
This is fundamentally the inverse of dart_to_model.f90.
Rip out the bits of the state vector you want to paste into a vector for DART,
prepend the valid_time in the approved DART format and you're good to go.
If you pack the bits into a DART state vector, there are native DART routines
to write out the state vector. This ensures that DART will be able to read what
you've written, and insulates you from having to worry about any internal file
format changes we might make.
Adding Matlab® support for your own model  under construction.
Only needed for statespace diagnostics.
Define a structure with required elements.
Testing Strategies  under construction
Generally testing when you add a new model to DART includes:
Checking the converter programs.
Checking the model advance control.
Starting with one observation in a known location, with a known value
and error specification.
Performing a 'perfect model' experiment for a few timesteps.
Looking at what the assimilation did with:
ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc
ncview Innov.nc
Did my experiment work?
If filter ran and produced output files, now what?
What output and diagnostic files are produced
from executable "perfect_model_obs" 
obs_seq.out 
the observations at some predefined times and locations 
True_State.nc 
a netCDF file containing the model trajectory 
perfect_restart 
the final state of the model  in ASCII 

from executable "filter" 
Prior_Diag.nc 
the model states right before assimilation 
Posterior_Diag.nc 
the model states immediately after assimilation 
obs_seq.final 
the model estimates of the observations (an integral
part of the data assimilation process) 
filter_restart 
the ensemble of final model states 

from both 
dart_log.out 
the 'important' runtime output (each run of filter appends
to this file; remove it or start at the bottom to see the
latest values) 
dart_log.nml 
the input parameters used for an experiment 
First questions to ask
After filter executes without error and produces an obs_seq.final file,
a Prior_Diag.nc file, and a Posterior_Diag.nc file,
the first questions to ask are if any observations were successfully assimilated,
and is the model state output from filter is different
from the input.
One way to check if the output model state data was changed by the assimilation is
to use the 'ncdiff' tool to difference the Prior and Posterior diagnostic files:
ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc
ncview Innov.nc
Look at the first copy, which is the ensemble mean.
If all values in that copy are 0, then the assimilation changed nothing in the state.
Debugging hints:
You may need to rerun filter multiple times to diagnose this. The fastest
way to get to the answer is to make filter very fast to run. You can do this by:
 make an obs_seq file with only 1 or a just a few observations in it, and
 configure a run so filter does a single assimilation and exits, without
having to advance the ensemble of models or do other work.
To make an obs file with a single observation, use one of these methods:
 run ./create_obs_sequence to make a new, short, obs_seq file
 Use the obs_sequence_tool to cut your existing obs_seq.out file down
to just a few obs by selecting only a subset of the types and setting a very
short time window (just a second or two where you know there are obs).
To make the filter program only do an assimilation:
 Edit the input.nml and in the &filter_nml namelist
set the init_time_days and init_time_seconds to match the
observation time in the truncated obs_seq file.
This overrides any times in the restart/initial condition files and
ensures that filter will only assimilate and not try to advance the model.
 Make sure the truncated obs_seq file contains only 1 obs,
a few obs at the same time, or obs close enough together in time
to fit into a single assimilation window.
If there are no changes in the model state after assimilation, then
examine the obs_seq.final file. If this file is in binary format, change
this namelist item so the output observation sequence file will be written
in ascii:
&obs_sequence_nml
write_binary_obs_sequence = .false.
/
and rerun filter to regenerate an obs_seq.final file in ascii.
See the diagrams on
this page for help in understanding how to read the different fields in
an obs_seq.final file.
If all the prior and posterior mean
values are 888888.0 (the DART "missing data" value), those
observations were not assimilated. If it is not already set,
change &filter_nml :: num_output_obs_members to be the same
as the number of ensembles. This will give you all the forward
operator values for all the ensemble members. You can determine if all
ensemble members are failing in the same way, or if only a few are
problematic.
For the failing observations, check the 'DART QC' for the reason why.
(See
here for how to locate the different values in
an obs_seq.final file. The 'DART QC' field is usually the second of
the 2 "quality control" copies.)
If in the obs_seq.final file the prior and posterior values are not 888888.0 but
are identical, your obs are being assimilated but are having no impact.
The most common reasons assimilated obs have no impact on the model state include:
 Zero spread in ensemble members
Your initial ensemble members must have different values for each
state item. If all members have identical values, the observations
cannot make a change. To diagnose this condition, look at the Prior_Diag.nc
file, at copy=2 (the second copy), which is the ensemble spread. If all the values
are 0 this is your problem. The fix is to set &filter_nml :: start_from_restart = .false.
(which will still require a single filter initial condition file) but then the
filter code will add random gaussian perturbations to each state vector item to
generate an initial ensemble with spread. The magnitude of the gaussian noise
added is controlled by the &ensemble_manager :: perturbation_amplitude. It is
also possible to write your own perturbation routine in your model_mod.f90 code.
 Cutoff value too small
Check the localization (cutoff) radius in the &assim_tools_nml namelist,
the 'cutoff' item. Set it to a very large number (e.g. 100000) and rerun.
If there is now an impact, this cutoff was restricting the items in the
state vector so your obs had no impact before.
Cutoff values are dependent on the location type being used.
It is specified in radians for the threed_sphere locations module
(what most large models use), or in simple distance if using a
low order model (lorenz, ikeda, etc).
 Obs error values too large (less likely)
If the observation error is very large, it will have no impact
on the model state. This is less likely a cause than other possibilities.
 No correlation (unlikely)
If there is no correlation between the distribution of the forward observation
values and the state vector values, the increments will be very tiny. However
there are generally still tiny increments applied, so this is also a low likelyhood case.
 Errors in forward operator location computations, or model get_close_obs()
If there is an error in the model_mod.f90 code in either get_state_meta_data(),
model_interpolate(), or the vertical conversion code in get_close_obs(), it is possible
for the forward operators to appear to be working correctly, but the distances
computed for the separation between the obs and the state vector values can be
incorrect if the wrong locations are being passed back from get_state_meta_data().
This can result in the increments being applied in the wrong locations or not at all.
This is usually one of the things to test carefully when developing a new model interface.
 Incorrect vertical conversion
If the model is using 3d coordinates and needs the capability to convert between
pressure, height, and/or model level, if the conversion is done incorrectly the state
vector locations can appear to be too high or too low to be impacted by an observation.
Also some models have a height limit built into their model_mod code to avoid trying to
assimilate observations at the model top, if the model top is prescribed. The observations
cannot make meaningful changes to the model state there and trying to assimilate them
can lead to problems with the inflation. If the code in the model_mod is excluding
observations incorrectly, or you are testing with observations at the model top, this
can result in no impact on the model state.
How is the output different from the input?
If you compute the difference between the prior and posterior diagnostic
files by this process:
ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc
ncview Innov.nc
and you see a difference, is it correct?
If you run with a single observation, you should be able to easily see the
impact  generally it's a mostly spherical or circular ring around the
observation location depending on your vertical localization,
you may or may not see an impact in the vertical.
Using &location_nml::horiz_dist_only=.true. is usually a good idea
for a full 3d model to start out, and then add vertical localization
once you believe the horizontal impact makes sense.
Without any vertical localization, the observation should have an
impact along the entire vertical column. (For lower order models
this doesn't apply.)
If you change the cutoff distances you should be able to watch
the change in impact on the state and make sure that it's something reasonable.
Now you can use the observation space diagnostics,
and the state space diagnostics to get more information about
what the impact was, and whether it's doing the right thing or not.
Was the Assimilation Effective?
If your filter run finished, and the Posterior is now different from
the Prior, now what?
This section presumes that you have debugged your model/DART interfaces or are using
a model that came with the DART distribution. A working implementation.
There is no single metric that can declare success.
The DART Tutorial
has the best explanation of what to look for,
what to change to improve the next experiment, etc.
DART has an extensive set of diagnostics
implemented in Matlab® to use them, make sure you have read the
Configuring Matlab® to work with DART
section.
The Observations are the Key.
My own (Tim's) personal view is that the first thing to check is to see
how many observations are getting rejected by the assimilation in addition
to the RMSE and spread of the ensemble. A natural part of the DART framework is
that these metrics are calculated automatically and are readily
available in the obs_seq.final files. Checking
the temporal evolution of the RMSE and observation rejection characteristics
is a firstorder metric for determining the health of the assimilation system.
 Use obs_diag.f90 to process the collection of
obs_seq.final files for regions and times of interest.
Edit the input.nml:obs_diag_nml namelist to reflect
the details of your experiment, and then run obs_diag
to create a netCDF file obs_diag_output.nc
that contains the summaries.
 Make sure the spread of the ensemble does not collapse. Use
plot_evolution.m with copystring = 'spread';
to explore
obs_diag_output.nc. It is normal (and desirable!) for the spread
to decrease somewhat from the initial value, but it should not decrease to a small value.
Insufficient spread leads to filter divergence and a large observation rejection rate.
plot_evolution.m automatically plots the number of observations
available and the number of observations successfully assimilated.
 Make sure the RMSE of the ensemble does not collapse. Use
plot_evolution.m with copystring = 'rmse';
to explore
obs_diag_output.nc. It is important to interpret the RMSE in light
of the number of observations successfully assimilated. It is possible to have a very low
RMSE if the assimilation system rejects all of the observations that are in disagreement!
A low RMSE is desirable and is a much stronger result if most/all of the observations
are being assimilated successfully. Also  the RMSE of the prior is a
much stronger result. Any method can overfit the observations (match them perfectly) 
what is important is that the forecast is a good forecast!
 Make sure the RMSE of the ensemble does not continually increase.
plot_evolution.m with copystring = 'rmse';
to explore
obs_diag_output.nc. It is natural for the RMSE to vary in time
in response to the changing number and location of the observations, the phenomenon
being modeled, etc. ... but it should not generally increase as the system evolves.
Once the system has 'burned in', the RMSE should be relatively stable.
 Check to make sure the observations that are getting rejected are getting rejected
for the right reasons. Run
obs_seq_to_netcdf to convert the obs_seq.final
files into netCDF files that can be explored with
link_obs.m or plot_obs_netcdf.m.
Both of these tools will allow you to isolate the rejected observations and determine if
the observations are being rejected because they are in disagreement with the ensemble
(DART QC == 7) or
they were rejected because of a namelist setting (DART QC == 5),
or incoming QC value (DART QC == 6),
or were they simply outside the domain (DART generally only interpolates, not extrapolate)
or ...
 Check that the ensemble spread captures the 'truth' as determined by the observations.
Use obs_diag to create the data for a rank histogram.
Plot the histograms with ncview or
plot_rank_histogram.m
Generally speaking, the observationspace diagnostics provide the first and best metrics
for the assimilation system. We always have observations, we rarely have the 'truth' in
a full statespace representation. Personally, I rarely see value when comparing to some
other gridded product  as it surely has its own set of deficiencies or assumptions.
Observations aren't perfect  but they are still best.
NOTE: Combining observations from multiple sources can lead to the
same observation type begin defined with different vertical coordinate systems.
While this does not pose a problem for assimilation, it does pose a problem
for the diagnostics. The current obs_diag.f90
cannot (vertically) bin the same observation type that exploits two different vertical
coordinate systems. If you get a WARNING from obs_diag:CheckVertical()
you should know that the observation triggering the warning uses a different vertical coordinate
system that the first observation encountered of that same type.
If the termlevel is set to be 1, this is a fatal error, not a warning.
The following example is one where observation 7113 defined the vertical coordinate system to
be VERTISSURFACE (i.e. 1) and observation 7150 was created using VERTISPRESSURE.
The messages take the form:
WARNING FROM:
routine: CheckVertical
message: obs 7150 type 42 changing from 1 to pressure  def by obs 7113
StateSpace Diagnostics
If you're reading this, it means you have an assimilation that is not rejecting
too many observations, the ensemble spread is relatively constant and is not too small,
and presumably the RMSE is stable if not decreasing. Now it is time to assess the
affect the assimilation has on your model  the whole point of data assimilation.
Every DART assimilation produces two netCDF files: Prior_Diag.nc
and Posterior_Diag.nc. These provide quick access to the
individual statespace variables that constitute the DART state vector.
If filter was run with
input.nml:model_nml:output_state_vector = .false.
(normally the default for larger models), the DART state vector will be
reassembled into prognostic variables. These variables can be easily explored
in standard netCDF fashion. It is very common and trivially easy to explore the
increments:
ncdump v CopyMetaData Prior_Diag.nc  to check which copy you want to explore 
ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc  ncdiff comes from NCO, not DART 
ncview Innov.nc  ncview is another 'thirdparty' tool. 
See DART statespace diagnostics for more.
DART observationspace diagnostics.
The DART QC table is an important piece of information.
QC value  meaning 
0,1 == both Prior and Posterior are good. 
0  Assimilated O.K. 
1  Evaluated O.K., not assimilated because namelist specified evaluate only. 
2,3 == Prior OK, but Posterior failed. 
2  Assimilated O.K. BUT posterior forward operator failed. 
3  Evaluated O.K. BUT posterior forward operator failed. 
4 or higher == both Prior and Posterior have failed 
4  Prior forward operator failed. 
5  Not used because of namelist control. 
6  Rejected because of incoming data QC higher than namelist control. 
7  Rejected because of outlier threshold test. 
8 and above  reserved for future use. 
It is required to postprocess the obs_seq.final file(s) with
obs_diag
to generate a netCDF file containing accumulated diagnostics for specified regions, etc.
Since the experiment information (assimilation interval, assimilating model, etc.)
are not recorded in the obs_seq.final file, the
obs_diag_nml
namelist has a section that allows specification of the necessary quantities.
The following quantities are normally diagnosed:
Nposs 
the number of observations for each assimilation period; 
Nused 
the number of observations successfully assimilated each assimilation period; 
NbigQC 
the number of observations that had an undesirable (original) QC value; 
NbadIZ 
the number of observations that had an undesirable Innovation Z; 
NbadUV 
the number of velocity observations that had a matching component that was not assimilated; 
NbadLV 
the number of observations that were above or below the highest or lowest model level, respectively; 
rmse 
the rmse of the ensemble; 
bias 
the bias of the ensemble (forecastobservation); 
spread 
the spread of the ensemble; and the 
totalspread 
pooled spread of the observation (knowing its observational error) and the ensemble. 
NbadDARTQC 
the number of observations that had a DART QC value (> 1 for a prior, > 3 for a posterior) 
observation 
the mean of the observation values 
ens_mean 
the ensemble mean of the model estimates of the observation values 
N_DARTqc_0 
the number of observations that had a DART QC value of 0 
N_DARTqc_1 
the number of observations that had a DART QC value of 1 
N_DARTqc_2 
the number of observations that had a DART QC value of 2 
N_DARTqc_3 
the number of observations that had a DART QC value of 3 
N_DARTqc_4 
the number of observations that had a DART QC value of 4 
N_DARTqc_5 
the number of observations that had a DART QC value of 5 
N_DARTqc_6 
the number of observations that had a DART QC value of 6 
N_DARTqc_7 
the number of observations that had a DART QC value of 7 
The observationspace functions are in the DART/diagnostics/matlab
directory. Once you have processed the obs_seq.final files into
a single obs_diag_output.nc, you can use that as input to your own
plotting routines or use the following DART Matlab® routines:
plot_evolution.m 
plots the temporal evolution of any of the quantities above for each
variable for specified levels. The number of observations possible and used
are plotted on the same axis. 

fname = 'POP11/obs_diag_output.nc'; % netcdf file produced by 'obs_diag'
copystring = 'rmse'; % 'copy' string == quantity of interest
plotdat = plot_evolution(fname,copystring); %  OR 
plotdat = plot_evolution(fname,copystring,'RADIOSONDE_TEMPERATURE');


plot_profile.m 
plots the spatial and temporal average of any specified quantity as a
function of height. The number of observations possible and used
are plotted on the same axis. 

fname = 'POP11/obs_diag_output.nc'; % netcdf file produced by 'obs_diag'
copystring = 'rmse'; % 'copy' string == quantity of interest
plotdat = plot_profile(fname,copystring);


plot_rmse_xxx_evolution.m 
same as plot_evolution.m
but will overlay rmse on the same axis. 
plot_rmse_xxx_profile.m 
same as plot_profile.m with an overlay of rmse. 
plot_bias_xxx_profile.m 
same as plot_profile.m with an overlay of bias. 

two_experiments_evolution.m 
same as plot_evolution.m
but will overlay multiple (more than two, actually) experiments
(i.e. multiple obs_diag_output.nc files) on the same axis.
A separate figure is created for each region in the obs_diag_output.nc file. 

files = {'POP12/obs_diag_output.nc','POP11/obs_diag_output.nc'};
titles = {'CAM4','CAM3.6.71'};
varnames = {'ACARS_TEMPERATURE'};
qtty = 'rmse';
prpo = 'prior';
levelind = 5;
two_experiments_evolution(files, titles,{'ACARS_TEMPERATURE'}, qtty, prpo, levelind)


two_experiments_profile.m 
same as plot_profile.m
but will overlay multiple (more than two, actually) experiments
(i.e. multiple obs_diag_output.nc files) on the same axis.
If the obs_diag_output.nc file was created with multiple regions,
there are multiple axes on a single figure. 

files = {'POP12/obs_diag_output.nc','POP11/obs_diag_output.nc'};
titles = {'CAM4','CAM3.6.71'};
varnames = {'ACARS_TEMPERATURE'};
qtty = 'rmse';
prpo = 'prior';
two_experiments_profile(files, titles, varnames, qtty, prpo)


plot_rank_histogram.m 
will create rank histograms for any variable that has that information present in
obs_diag_output.nc. 

fname = 'obs_diag_output.nc'; % netcdf file produced by 'obs_diag'
timeindex = 3; % plot the histogram for the third timestep
plotdat = plot_rank_histogram(fname, timeindex, 'RADIOSONDE_TEMPERATURE');


plot_observation_locations.m 
This routine requires processing the observations through obs_diag
with an input option to create a separate XYZ file of the observation locations. This
has been deprecated in favor of plot_obs_netcdf,
but still works just fine.

You may also convert observation sequence files to netCDF by using
obs_seq_to_netcdf.
All of the following routines will work on observation sequences files AFTER an assimilation
(i.e. obs_seq.final files that have been converted to netCDF), and some of them
will work on obs_seq.outtype files that have been converted.
read_obs_netcdf.m 
reads a particular variable and copy from a netCDFformat
observation sequence file and returns a single structure with useful bits for plotting/exploring. 

fname = 'obs_sequence_001.nc';
ObsTypeString = 'RADIOSONDE_U_WIND_COMPONENT'; % or 'ALL' ...
region = [0 360 90 90 Inf Inf];
CopyString = 'NCEP BUFR observation';
QCString = 'DART quality control';
verbose = 1; % anything > 0 == 'true'
obs = read_obs_netcdf(fname, ObsTypeString, region, CopyString, QCString, verbose);


plot_obs_netcdf.m 
creates a 3D scatterplot of the observation locations,
colorcoded to the observation values. A second axis will also
plot the QC values if desired. 

fname = 'POP11/obs_epoch_011.nc';
region = [0 360 90 90 Inf Inf];
ObsTypeString = 'AIRCRAFT_U_WIND_COMPONENT';
CopyString = 'NCEP BUFR observation';
QCString = 'DART quality control';
maxgoodQC = 2;
verbose = 1; % > 0 means 'print summary to command window'
twoup = 1; % > 0 means 'use same Figure for QC plot'
bob = plot_obs_netcdf(fname, ObsTypeString, region, CopyString, ...
QCString, maxgoodQC, verbose, twoup);


plot_obs_netcdf_diffs.m 
creates a 3D scatterplot of the difference between two
'copies' of an observation. 

fname = 'POP11/obs_epoch_011.nc';
region = [0 360 90 90 Inf Inf];
ObsTypeString = 'AIRCRAFT_U_WIND_COMPONENT';
CopyString1 = 'NCEP BUFR observation';
CopyString2 = 'prior ensemble mean';
QCString = 'DART quality control';
maxQC = 2;
verbose = 1; % > 0 means 'print summary to command window'
twoup = 0; % > 0 means 'use same Figure for QC plot'
bob = plot_obs_netcdf_diffs(fname, ObsTypeString, region, CopyString1, CopyString2, ...
QCString, maxQC, verbose, twoup);


plot_wind_vectors.m 
creates a 2D 'quiver' plot of a wind field. 

fname = 'obs_epoch_001.nc';
platform = 'SAT'; % usually 'RADIOSONDE', 'SAT', 'METAR', ...
CopyString = 'NCEP BUFR observation';
QCString = 'DART quality control';
region = [210 310 12 65 Inf Inf];
scalefactor = 5; % reference arrow magnitude
bob = plot_wind_vectors(fname, platform, CopyString, QCString, ...
'region', region, 'scalefactor', scalefactor);


link_obs.m 
creates multiple figures that have linked attributes. Select all the observations
with DART QC == 4 in one window, and those same observations are
highlighted in all the other windows (for example). All the images are links to larger versions 
the image on the right has the Matlab® call.




ObsTimeCheck.m 
is an example of a trivial little script to wrap around plot_obs_netcdf.m
that allows you to explore the spatial distribution of your observation sequences.
Since obs_seq_to_netcdf doesn't know anything about assimilation windows;
the idea is to create separate netCDF files for each assimilation window and then
explore a sequence of windows. Since ObsTimeCheck.m is in the subversion
repository, you should feel free to edit it/modify it to your heart's desire. If there are no
observations of that type in a particular assimilation window, the Matlab® Command window
will have a comment to that effect.




Understanding what's in obs_diag_output.nc
After you create obs_diag_output.nc with
obs_diag
it is important to understand what is contained in obs_diag_output.nc.
Remember, this is just a dump of the header of the file!
[work]$ ncdump v CopyMetaData obs_diag_output.nc
netcdf obs_diag_output {
dimensions:
copy = 21 ;
obstypes = 78 ;
region = 1 ;
surface = 1 ;
undef = 1 ;
mlevel = 11 ;
mlevel_edges = 12 ;
plevel = 11 ;
plevel_edges = 12 ;
hlevel = 11 ;
hlevel_edges = 12 ;
time = UNLIMITED ; // (6 currently)
bounds = 2 ;
stringlength = 32 ;
rank_bins = 97 ;
variables:
int copy(copy) ;
copy:explanation = "see CopyMetaData" ;
int obstypes(obstypes) ;
obstypes:explanation = "see ObservationTypes" ;
int region(region) ;
region:long_name = "model region" ;
region:units = "nondimensional" ;
region:valid_range = 1, 1 ;
int mlevel(mlevel) ;
mlevel:long_name = "model level" ;
mlevel:units = "model level" ;
mlevel:axis = "Z" ;
mlevel:valid_range = 1, 11 ;
float mlevel_edges(mlevel_edges) ;
mlevel_edges:long_name = "model level edges" ;
mlevel_edges:units = "model level" ;
mlevel_edges:axis = "Z" ;
mlevel_edges:valid_range = 0.5, 11.5 ;
float plevel(plevel) ;
plevel:long_name = "pressure bin midpoints" ;
plevel:units = "hPa" ;
plevel:axis = "Z" ;
plevel:valid_range = 100., 1000. ;
float plevel_edges(plevel_edges) ;
plevel_edges:long_name = "pressure bin edges" ;
plevel_edges:units = "hPa" ;
plevel_edges:axis = "Z" ;
plevel_edges:valid_range = 75., 1025. ;
float hlevel(hlevel) ;
hlevel:long_name = "height bin midpoints" ;
hlevel:units = "m" ;
hlevel:axis = "Z" ;
hlevel:valid_range = 1000., 11000. ;
float hlevel_edges(hlevel_edges) ;
hlevel_edges:long_name = "height bin edges" ;
hlevel_edges:units = "m" ;
hlevel_edges:axis = "Z" ;
hlevel_edges:valid_range = 500., 11500. ;
int bounds(bounds) ;
bounds:valid_range = 1, 2 ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "temporal bin midpoints" ;
time:units = "days since 160111" ;
time:calendar = "Gregorian" ;
time:axis = "T" ;
time:bounds = "time_bounds" ;
time:valid_range = 148880.5, 148885.5 ;
double time_bounds(time, bounds) ;
time_bounds:long_name = "temporal bin edges" ;
time_bounds:units = "days since 160111" ;
time_bounds:calendar = "Gregorian" ;
time_bounds:valid_range = 148880.000011574, 148886. ;
char region_names(region, stringlength) ;
region_names:long_name = "region names" ;
char CopyMetaData(copy, stringlength) ;
CopyMetaData:long_name = "quantity names" ;
char ObservationTypes(obstypes, stringlength) ;
ObservationTypes:long_name = "DART observation types" ;
ObservationTypes:comment = "table relating integer to observation type string" ;
int rank_bins(rank_bins) ;
rank_bins:long_name = "rank histogram bins" ;
rank_bins:comment = "position of the observation among the sorted noisy ensemble members" ;
float RADIOSONDE_U_WIND_COMPONENT_guess(time, copy, plevel, region) ;
RADIOSONDE_U_WIND_COMPONENT_guess:_FillValue = 888888.f ;
RADIOSONDE_U_WIND_COMPONENT_guess:missing_value = 888888.f ;
int RADIOSONDE_U_WIND_COMPONENT_guess_RankHi(time, rank_bins, plevel, region) ;
...
float AIRCRAFT_U_WIND_COMPONENT_guess(time, copy, plevel, region) ;
AIRCRAFT_U_WIND_COMPONENT_guess:_FillValue = 888888.f ;
AIRCRAFT_U_WIND_COMPONENT_guess:missing_value = 888888.f ;
int AIRCRAFT_U_WIND_COMPONENT_guess_RankHist(time, rank_bins, plevel, region) ;
...
float RADIOSONDE_U_WIND_COMPONENT_analy(time, copy, plevel, region) ;
RADIOSONDE_U_WIND_COMPONENT_analy:_FillValue = 888888.f ;
RADIOSONDE_U_WIND_COMPONENT_analy:missing_value = 888888.f ;
...
float AIRCRAFT_U_WIND_COMPONENT_analy(time, copy, plevel, region) ;
AIRCRAFT_U_WIND_COMPONENT_analy:_FillValue = 888888.f ;
AIRCRAFT_U_WIND_COMPONENT_analy:missing_value = 888888.f ;
...
float RADIOSONDE_U_WIND_COMPONENT_VPguess(copy, plevel, region) ;
RADIOSONDE_U_WIND_COMPONENT_VPguess:_FillValue = 888888.f ;
RADIOSONDE_U_WIND_COMPONENT_VPguess:missing_value = 888888.f ;
...
float RADIOSONDE_U_WIND_COMPONENT_VPanaly(copy, plevel, region) ;
RADIOSONDE_U_WIND_COMPONENT_VPanaly:_FillValue = 888888.f ;
RADIOSONDE_U_WIND_COMPONENT_VPanaly:missing_value = 888888.f ;
...
...
// global attributes:
:creation_date = "YYYY MM DD HH MM SS = 2011 03 18 13 37 34" ;
:obs_diag_source = "URL: blah_blah_blah/diagnostics/threed_sphere/obs_diag.html $" ;
:obs_diag_revision = "$Revision$" ;
:obs_diag_revdate = "$Date$" ;
:bias_convention = "model  observation" ;
:horizontal_wind = "vector wind derived from U,V components" ;
:horizontal_wind_bias = "definition : sum[sqrt(u**2 + v**2)  obsspeed]/nobs" ;
:horizontal_wind_rmse = "definition : sqrt(sum[(uuobs)**2 + (vvobs)**2]/nobs)" ;
:horizontal_wind_spread = "definition : sqrt(sum[var(u) + var(v)]/nobs)" ;
:DART_QCs_in_histogram = 0, 1, 2, 3 ;
:outliers_in_histogram = "FALSE" ;
:first_bin_center = 2008, 8, 15, 12, 0, 0 ;
:last_bin_center = 2008, 8, 20, 12, 0, 0 ;
:bin_separation = 0, 0, 1, 0, 0, 0 ;
:bin_width = 0, 0, 1, 0, 0, 0 ;
:time_to_skip = 0, 0, 0, 0, 0, 0 ;
:max_num_bins = 1000 ;
:rat_cri = 5000. ;
:input_qc_threshold = 3. ;
:lonlim1 = 0. ;
:lonlim2 = 360. ;
:latlim1 = 90. ;
:latlim2 = 90. ;
:obs_seq_file_001 = "/ptmp/thoar/HuiGPS/08_01/obs_seq.final" ;
...
...
data:
CopyMetaData =
"Nposs ",
"Nused ",
"NbigQC ",
"NbadIZ ",
"NbadUV ",
"NbadLV ",
"rmse ",
"bias ",
"spread ",
"totalspread ",
"NbadDARTQC ",
"observation ",
"ens_mean ",
"N_DARTqc_0 ",
"N_DARTqc_1 ",
"N_DARTqc_2 ",
"N_DARTqc_3 ",
"N_DARTqc_4 ",
"N_DARTqc_5 ",
"N_DARTqc_6 ",
"N_DARTqc_7 " ;
}
There are many more variables in this particular netCDF file 
indicated by the '...' (the ellipsis).
Every observation type is preserved in its own set of variables with the following suffixes:
suffix  description 
_guess  prior, forecast 
_analy  posterior, analysis 
_VPguess  vertical profile only  prior, forecast 
_VPanaly  vertical profile only  posterior, analysis 
_RankHist  rank histogram  prior 
What is really important to note is that each observation variable has a copy
dimension  and each copy description is contained in the CopyMetaData
variable. A dump of that variable provides information about what quantities are directly
available. In the above example, the rmse is copy 7. You should
never assume this information, you should always check the CopyMetaData
variable to find the appropriate copy index.
Let's look at the RADIOSONDE_U_WIND_COMPONENT_guess variable in the
example. It has 4 dimensions: [time, copy, plevel, region]. Since 'plevel' is one of
the dimensions, the appropriate levels are defined by the coordinate variable of the same name.
The RADIOSONDE_U_WIND_COMPONENT_VPguess variable has
3 dimensions: [copy, plevel, region]. The 'time' dimension has been averaged out
such that a single Vertical Profile (VP) is defined for
the entire timeframe specified in the namelist. If I were a better man, I'd put the averaging
interval in the variable metadata to have a hope of complying with convention; instead,
it is only in the global metadata for the entire netCDF file as
first_bin_center, last_bin_center, and time_to_skip.
Add the time_to_skip to the first_bin_center to
derive the start of the averaging period. This allows one to ignore the effects of spinup.
The RADIOSONDE_U_WIND_COMPONENT_guess_RankHi variable name has
been cut short by the netCDF restriction that variable names can only contain 40 characters.
The _RankHist variables are only present if the input
obs_seq.final files have multiple ensemble members present.
You cannot derive this information unless the assimilation was performed with
filter_nml:num_output_obs_members equal to something like the
ensemble size.
Viewing the rank histogram information with 'ncview'
After you create obs_diag_output.nc with
obs_diag
you can view the rank histograms in the following way:
[work]$ ncview obs_diag_output.nc
Note there are fortyseven 3D variables. Pick one. In this case,
I selected AIRCRAFT_U_WIND_COMPONENT_guess_RankHist
Navigate to the time of interest (these are the arrows next to the QUIT button.)
If ncview was built with udunits
support, the actual calendar dates and times should be shown next to the frame counter.
That should generate something like the graphic on the right below.
Since the default axes are (Y == histogram_bins, X == levels)
and there are many more ensemble members (96) than vertical levels (20)
in this netCDF file, the graphic appears tall and narrow.
Click anywhere on the graphic and something like the following is displayed:
Change the "X Axis:" to "rank_bins" and a new graphic will display the rank
histogram.
If you continue to click on the "tall,skinny" graphic, the histogram for
that level will be added to the rank histogram window. Remember, levels are
along the X axis on the "tall,skinny" graphic. Viola'!
DART statespace diagnostics.
Matlab®
There are a set of Matlab® functions to help explore the assimilation
performance in statespace, which is very useful for OSSE's (i.e. when you
know the true model state). The general guideline here is that anything
that computes an 'error' requires the truth. There are some functions that
work without a true state.
In order to use any of these functions, the scripts need to know how to interpret
the layout of the netCDF file  which is usually modeldependent. See the section
on Adding Matlab® support for your own model if you are not using one of the supported DART models.
The statespace
functions are in the DART/matlab directory:
plot_total_err.m 
plots the evolution of the error (unnormalized) and ensemble
spread of all state variables. 
plot_bins.m 
plots the rank histograms for a set of state variables. 
plot_ens_time_series.m 
plots the evolution of a set of state variables  all ensemble
members, the ensemble mean (and Truth, if available). 
plot_ens_mean_time_series.m 
plots the evolution of a set of state variables 
just the ensemble mean (and Truth, if available). 
plot_ens_err_spread.m 
plots the evolution of the ensemble error and spread for a select
set of state variables. 
plot_correl.m 
plots the correlation through time of a state variable and
the rest of the state. 
plot_var_var_correl.m 
plots the correlation between two variables at a particular location. 
plot_jeff_correl.m 
plots the correlation through time of a state variable at a particular
time and any state variable. 
plot_sawtooth.m 
plots the trajectory of any set of state variables
highlighting the assimilation 'increment'. 
plot_phase_space.m 
plots the trajectory of any two or three state variables.
The classic attractor diagram. 
plot_smoother_err.m 
plots the error of the ensemble smoother  which uses observations
in the future as well as the present. 
DART diagnostic variables all have a 'copy' dimension that relates to the
fact that it might be the ensemble mean, the ensemble spread,
ensemble member 43, the inflation values ... a whole host of possibilities.
For instance:
0[1020] swordfish:~/<3>models/wrf/work % ncdump v copy,CopyMetaData Prior_Diag.nc
netcdf Prior_Diag {
dimensions:
metadatalength = 64 ;
copy = 3 ;
time = UNLIMITED ; // (1 currently)
NMLlinelen = 129 ;
NMLnlines = 203 ;
domain = 1 ;
west_east_d01 = 267 ;
west_east_stag_d01 = 268 ;
south_north_d01 = 215 ;
south_north_stag_d01 = 216 ;
bottom_top_d01 = 35 ;
bottom_top_stag_d01 = 36 ;
variables:
int copy(copy) ;
copy:long_name = "ensemble member or copy" ;
copy:units = "nondimensional" ;
copy:valid_range = 1, 100 ;
char CopyMetaData(copy, metadatalength) ;
CopyMetaData:long_name = "Metadata for each copy/member" ;
...
float U_d01(time, copy, bottom_top_d01, south_north_d01, west_east_stag_d01) ;
...
copy = 1, 2, 3 ;
CopyMetaData =
"ensemble mean ",
"ensemble spread ",
"ensemble member 1" ;
}
NonMatlab® statespace diagnostics
The innovations to the model state are easy to derive.
Use the NCO Operator
ncdiff to difference the two DART diagnostic
netCDF files to create the innovations.
Be sure to check the CopyMetaData variable to figure out
what copy is of interest.
Then, use ncview to explore the innovations or the
inflation values or ...
If the assimilation used statespace inflation, the inflation fields will
be added as additional 'copies'. A sure sign of trouble is if the inflation fields
grow without bound. As the observation network changes, expect the inflation values
to change.
The only other thing I look for in statespace is that the increments are 'reasonable'.
As the assimilation 'burns in', the increments are generally larger than increments from
an assimilation that has been cycling for a long time. If the increments keep getting
bigger, the ensemble is continually drifting away from the observation. Not good.
In ncview, it is useful to navigate to the copy/level of interest
and rerange the data to values appropriate to the current data and then hit the '>>'
button to animate the image. It should be possible to get a sense of the magnitude of the
innovations as a function of time.
Example
I ran a perfect model experiment with the bgrid model in the DARTdefault configuration
and turned on some adaptive inflation for this example. To fully demonstrate the
adaptive inflation, it is useful to have an observation network that changes through
time. I created two observation sequence files: one that had a single
'RADIOSONDE_TEMPERATURE' observation at the surface with an observation error
variance of 1.5 degrees Kelvin  repeated every 6 hours for 6 days (24 timesteps);
and one that had 9 observations locations clustered in about the same location that
repeated every 6 hours for 1.5 days (6 timesteps).
I merged the two observation sequences into one using
obs_sequence_tool and ran them through perfect_model_obs to
derive the observation values and create an obs_seq.out file to run
through filter.
0[1074] models/bgrid_solo/work % ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc
0[1075] models/bgrid_solo/work % ncview Prior_Diag.nc &
0[1076] models/bgrid_solo/work % ncview Innov.nc &
0[1077] models/bgrid_solo/work % ncdump v CopyMetaData Prior_Diag.nc
netcdf Prior_Diag {
dimensions:
metadatalength = 64 ;
locationrank = 3 ;
copy = 24 ;
time = UNLIMITED ; // (24 currently)
NMLlinelen = 129 ;
NMLnlines = 303 ;
StateVariable = 28200 ;
TmpI = 60 ;
TmpJ = 30 ;
lev = 5 ;
VelI = 60 ;
VelJ = 29 ;
variables:
...
float ps(time, copy, TmpJ, TmpI) ;
ps:long_name = "surface pressure" ;
ps:units = "Pa" ;
ps:units_long_name = "pascals" ;
float t(time, copy, lev, TmpJ, TmpI) ;
t:long_name = "temperature" ;
t:units = "degrees Kelvin" ;
float u(time, copy, lev, VelJ, VelI) ;
u:long_name = "zonal wind component" ;
u:units = "m/s" ;
float v(time, copy, lev, VelJ, VelI) ;
v:long_name = "meridional wind component" ;
v:units = "m/s" ;
...
data:
CopyMetaData =
"ensemble mean ",
"ensemble spread ",
"ensemble member 1 ",
"ensemble member 2 ",
"ensemble member 3 ",
"ensemble member 4 ",
"ensemble member 5 ",
"ensemble member 6 ",
"ensemble member 7 ",
"ensemble member 8 ",
"ensemble member 9 ",
"ensemble member 10 ",
"ensemble member 11 ",
"ensemble member 12 ",
"ensemble member 13 ",
"ensemble member 14 ",
"ensemble member 15 ",
"ensemble member 16 ",
"ensemble member 17 ",
"ensemble member 18 ",
"ensemble member 19 ",
"ensemble member 20 ",
"inflation mean ",
"inflation sd " ;
}


This is an exploration of the Prior_diag.nc file.
Note that I selected the 't' field, turned the coastlines 'off' under
the 'Opts' button, used the 'Repl' instead of 'Bilin' (to more faithfully
represent the model resolution), navigated to copy 23 of 24 (in this case,
the inflation mean) and advanced to the last timestep.
The image plot is pretty boring, but does indicate that the inflation values
are restricted to where I put the observations. Rightclicking on the
'Range' button automatically reranges the colorbar to the min/max of the
current data. Clicking on any location generates a time series figure. 


This is an exploration of the Innov.nc file
as created by ncdiff. Note that the titles are somewhat
misleading because they reflect information from the first file given to
ncdiff. This time I left the rendering as 'Bilin'
(which obfuscates the model resolution), navigated to copy 1 of 24
(in this case, the ensemble mean) and advanced to the 6th timestep.
Rightclick on the 'Range' button to reset the colorbar.
The image plot confirms that the innovations are restricted to a local region.
Clicking on any location generates a time series. 


This is fundamentally the same as the previous panel except that
I have now selected the 'u' variable. Despite the fact the observations
were only of 't', the assimilation has generated (rightly so) increments to the
'u' state variable. 
Configuring Matlab® to work with DART
Matlab® R2008b is the first version to have native netCDF support,
with its own syntax that would require a total rewrite of the DART interfaces
that would then be incompatible with older versions of Matlab®.
As of July 2009 DART exclusively uses the snctools
interface functions for netCDF  which rely solely on the
mexnc mexfile interface
and is available for 'all' versions of Matlab®.
Find your version of Matlab® (type 'ver' at the Matlab prompt) and visit
http://mexcdf.sourceforge.net/downloads
to get the right combination of mexnc and snctools.
The netcdf_toolbox
is no longer needed by DART.
There was a fundamental change in snctools with revision 4028 that essentially
breaks some key components of the DART diagnostic routines. Consequently,
we recommend that you use version 4024 of the snctools and mexnc toolboxes.
Here is how we download and install that particular version:
cd [wherever_you_install_toolboxes]
svn co r 4024 svn://svn.code.sf.net/p/mexcdf/svn/mexnc/trunk mexnc/4024
svn co r 4024 svn://svn.code.sf.net/p/mexcdf/svn/snctools/trunk snctools/4024
You will need the 'base' DART/matlab functions
available to Matlab®, so be sure your MATLABPATH is set such that you
have access to get_copy_index as well as
nc_varget (which comes from snctools).
This generally means you will have to manipulate your MATLABPATH with
something like (only add the first path if it exists for your model):
addpath('replace_this_with_the_real_path_to/DART/models/your_model/matlab')
addpath('replace_this_with_the_real_path_to/DART/matlab')
addpath('replace_this_with_the_real_path_to/DART/diagnostics/matlab')
addpath('replace_this_with_the_real_path_to/DART/DART_LAB/matlab')
addpath('wherever_you_install_toolboxes/snctools/4024','BEGIN')
addpath('wherever_you_install_toolboxes/mexnc/4024', 'BEGIN')
On my systems, I've bundled these commands into a function called
~/matlab/startup.m which is automatically
run every time I start Matlab®.
Examples  under construction
 observation location/value plots
 a brief explanation of 'localization'
 namelist settings for damped adaptive
spatiallyvarying group filter
Customizing DART  under construction.
Modify the code to your heart's content.
'svn revert' cures all ...
Adding your efforts to DART.
Please let us know if you have suggestions or code to contribute to DART.
We're a small group, but we are willing to listen and will make every endeavor to incorporate improvements to the code. Email us at dart@ucar.edu.
Assimilation Algorithms employed in DART.
The DART system includes several different types of Kalman filter algorithms,
selectable at runtime by a namelist setting.
See the documentation for the
assimilation tools module for descriptions of the various filter
types and how to select them.
Namelists
Many DART programs have namelists to specify runtime control.
Some programs use one or more modules  each module may have
its own namelist. As a consequence, we find it convenient to
have one file (called input.nml)
specifying all the namelists.
For a complete list of namelists, see the following document: master list of namelists
An example namelist for each program is automatically built
when the makefile is generated by mkmf_xxxxx.
The example namelist is named input.nml.xxxxx_default
where xxxxx is the name of the program. The example
namelists have default values, which may not be appropriate for your use.
The default input.nml in each work
directory generally has better values. As usual, the documentation for
each module is the best place to get information about the namelist settings.