## DART documentation:

The DART distribution includes a full set of documentation. Once you download DART, you may view the documentation offline by opening the index.html file in the top-level DART directory. If you want to view the main documentation page without downloading DART, you may view the documentation for the Lanai release at https://proxy.subversion.ucar.edu/DAReS/DART/releases/Lanai/index.html

## An incomplete list of DART-supported models:

There are two broad classes of models supported by DART. Some are 'low-order' models, generally single-threaded, subroutine-callable, and idealized: there are no real observations of these systems. The other class of models are 'high-order' models. There are real observations of these systems. Or at least, we like to think so ...

## The 'low-order' models supported in DART.

### ikeda

The Ikeda model is a 2D chaotic map useful for visualization data assimilation updating directly in state space. There are three parameters: a, b, and mu. The state is 2D, x = [X Y]. The equations are:

X(i+1) = 1 + mu * ( X(i) * cos( t ) - Y(i) * sin( t ) )
Y(i+1) =     mu * ( X(i) * sin( t ) + Y(i) * cos( t ) ),


where

t = a - b / ( X(i)**2 + Y(i)**2 + 1 )


Note the system is time-discrete already, meaning there is no delta_t. The system stems from nonlinear optics (Ikeda 1979, Optics Communications). Interface written by Greg Lawson. Thanks Greg!

### lorenz_63

This is the 3-variable model as described in: Lorenz, E. N. 1963. Deterministic nonperiodic flow. J. Atmos. Sci. 20, 130-141.
The system of equations is:

X' = -sigma*X + sigma*Y
Y' = -XZ + rX - Y
Z' =  XY -bZ


### lorenz_84

This model is based on:   Lorenz E. N., 1984: Irregularity: A fundamental property of the atmosphere. Tellus36A, 98-110.
The system of equations is:

X' = -Y^2 - Z^2  - aX  + aF
Y' =  XY  - bXZ  - Y   + G
Z' = bXY  +  XZ  - Z


Where a, b, F, and G are the model parameters.

### 9var

This model provides interesting off-attractor transients that behave something like gravity waves.

### lorenz_96

This is the model we use to become familiar with new architectures, i.e., it is the one we use 'first'. It can be called as a subroutine or as a separate executable. We can test this model both single-threaded and mpi-enabled.

Quoting from the Lorenz 1998 paper:

... the authors introduce a model consisting of 40 ordinary differential equations, with the dependent variables representing values of some atmospheric quantity at 40 sites spaced equally about a latitude circle. The equations contain quadratic, linear, and constant terms representing advection, dissipation, and external forcing. Numerical integration indicates that small errors (differences between solutions) tend to double in about 2 days. Localized errors tend to spread eastward as they grow, encircling the globe after about 14 days.
...
We have chosen a model with J variables, denoted by X1, ..., XJ; in most of our experiments we have let J = 40. The governing equations are:
dXj/dt = (Xj+1 - Xj-2)Xj-1 - Xj + F         (1)

for j = 1, ..., J. To make Eq. (1) meaningful for all values of j we define X-1 = XJ-1, X0 = XJ, and XJ+1 = X1, so that the variables form a cyclic chain, and may be looked at as values of some unspecified scalar meteorological quantity, perhaps vorticity or temperature, at J equally spaced sites extending around a latitude circle. Nothing will simulate the atmosphere's latitudinal or vertical extent.

### forced_lorenz_96

The forced_lorenz_96 model implements the standard L96 equations except that the forcing term, F, is added to the state vector and is assigned an independent value at each gridpoint. The result is a model that is twice as big as the standard L96 model. The forcing can be allowed to vary in time or can be held fixed so that the model looks like the standard L96 but with a state vector that includes the constant forcing term. An option is also included to add random noise to the forcing terms as part of the time tendency computation which can help in assimilation performance. If the random noise option is turned off (see namelist) the time tendency of the forcing terms is 0.

### lorenz_96_2scale

This is the Lorenz 96 2-scale model, documented in Lorenz (1995). It also has the option of the variant on the model from Smith (2001), which is invoked by setting local_y = .true. in the namelist. The time step, coupling, forcing, number of X variables, and the number of Ys per X are all specified in the namelist. Defaults are chosen depending on whether the Lorenz or Smith option is specified in the namelist. Lorenz is the default model. Interface written by Josh Hacker. Thanks Josh!

### lorenz_04

The reference for these models is Lorenz, E.N., 2005: Designing chaotic models. J. Atmos. Sci.62, 1574-1587.
Model II is a single-scale model, similar to Lorenz 96, but with spatial continuity in the waves. Model III is a two-scale model. It is fudamentally different from the Lorenz 96 two-scale model because of the spatial continuity and the fact that both scales are projected onto a single variable of integration. The scale separation is achived by a spatial filter and is therefore not perfect (i.e. there is leakage). The slow scale in model III is model II, and thus model II is a deficient form of model III. The basic equations are documented in Lorenz (2005) and also in the model_mod.f90 code. The user is free to choose model II or III with a Namelist variable.

This model is on a periodic one-dimensional domain. A wind field is modeled using Burger's Equation with an upstream semi-lagrangian differencing. This diffusive numerical scheme is stable and forcing is provided by adding in random gaussian noise to each wind grid variable independently at each timestep. An Eulerian option with centered-in-space differencing is also provided. The Eulerian differencing is both numerically unstable and subject to shock formation. However, it can sometimes be made stable in assimilation mode (see recent work by Majda and collaborators).

[top]

## The 'high-order' models supported in DART.

In roughly the order they were supported by DART.

### bgrid_solo

This is a dynamical core for B-grid dynamics using the Held-Suarez forcing. The resolution is configurable, and the entire model can be run as a subroutine. Status: supported.

### pe2lyr

This model is a 2-layer, isentropic, primitive equation model on a sphere. Status: orphaned.

### wrf

The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs. More people are using DART with WRF than any other model. Note: The actual WRF code is not distributed with DART. Status: supported.

### cam  DART/CAM datasets, guidelines

There are many DART-supported versions of CAM. The frozen version of the Community Climate System Model (CCSM4.0) uses the Community Atmosphere Model (CAM4). The Community Earth System Model (CESM 1.0) uses the Community Atmosphere Model (CAM5); the latest in a series of global atmosphere models developed at NCAR for the weather and climate research communities. Status: both are supported (as are some earlier releases).

### PBL_1d

The PBL model is a single column version of the WRF model. In this instance, the necessary portions of the WRF code are distributed with DART. Status: supported - but looking to be adopted.

### MITgcm_annulus

The MITgcm annulus model as configured for this application within DART is a non-hydrostatic, rigid lid, C-grid, primitive equation model utilizing a cylindrical coordinate system. For detailed information about the MITgcm, see http://mitgcm.org Status: orphaned - and looking to be adopted.

### rose

The rose model is for the stratosphere-mesosphere and was used by Tomoko Matsuo (now at CU-Boulder and NOAA) for research in the assimilation of observations of the Mesosphere Lower-Thermosphere (MLT). Note: the model code is not distributed with DART. Status: orphaned

### MITgcm_ocean

The MIT ocean GCM version 'checkpoint59a' is the foundation of this implementation. It was modified by Ibrahim Hoteit (then of Scripps) to accomodate the interfaces needed by DART. Status: supported - but looking to be adopted.

### am2

The FMS AM2 model is GFDL's atmosphere-only code using observed sea surface temperatures, time-varying radiative forcings (including volcanos) and time-varying land cover type. This version of AM2 (also called AM2.1) uses the finite-volume dynamical core (Lin 2004). Robert Pincus (CIRES/NOAA ESRL PSD1) and Patrick Hoffman (NOAA) wrote the DART interface and are currently using the system for research. Note: the model code is not distributed with DART. Status: supported

### coamps

The DART interface was originally written and supported by Tim Whitcomb. The following model description is taken from the COAMPS overview web page:

The Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS) has been developed by the Marine Meteorology Division (MMD) of the Naval Research Laboratory (NRL). The atmospheric components of COAMPS, described below, are used operationally by the U.S. Navy for short-term numerical weather prediction for various regions around the world.

Note: the model code is not distributed with DART. Status: supported

### POP

The Parallel Ocean Program (POP) comes in two variants. Los Alamos National Laboratory provides POP Version 2.0 which has been modified to run in the NCAR Community Climate System Model (CCSM) framework. As of November 2009, the CCSM-POP version is being run. The LANL-POP version is nearly supported - and some extensions useful for data assimilation in general have been proposed to LANL, who have agreed in principle to implement the changes. Fundamentally, the change is an additional restart option in which the first timestep after an assimilation is a Eulerian timestep (similar to a cold start). Note: the souce code for POP is not distributed with DART. Status: actively being developed

[top]

The code distribution was getting 'cluttered' with datasets, boundary conditions, intial conditions, ... large files that were not necessarily interesting to all people who downloaded the DART code. Worse, subversion makes a local hidden copy of the original repository contents, so the penalty for being large is doubled. It just made sense to make all the large files available on as 'as-needed' basis.

To keep the size of the DART distribution down we have a separate www-site to provide some observation sequences, initial conditions, and general datasets. It is our intent to populate this site with some 'verification' results, i.e. assimilations that were known to be 'good' and that should be fairly reproducible - appropriate to test the DART installation.

Please be patient as I make time to populate this directory. (yes, 'make', all my 'found' time is taken ...)
Observation sequences can be found at http://www.image.ucar.edu/pub/DART/Obs_sets.

Verification experiments will be posted to http://www.image.ucar.edu/pub/DART/VerificationData as soon as I can get to it. These experiments will consist of initial conditions files for testing different high-order models like CAM, WRF, POP ...
The low-order models are already distributed with verification data in their work directories.

Useful bits for CAM can be found at http://www.image.ucar.edu/pub/DART/CAM.

Useful bits for WRF can be found at http://www.image.ucar.edu/pub/DART/WRF.

Useful bits for MPAS_ocn can be found at http://www.image.ucar.edu/pub/DART/MPAS_OCN.

[top]

## Creating initial conditions for DART

The idea is to generate an ensemble that has sufficient 'spread' to cover the range of possible solutions. Insufficient spread can (and usually will) lead to poor assimilations. Think 'filter divergence'.

Generating an ensemble of initial conditions can be done in lots of ways, only a couple of which will be discussed here. The first is to generate a single initial condition and let DART perturb it with noise of a nature you specify to generate as many ensemble members as you like. The second is to take some existing collection of model states and convert them to DART initial conditions files and then use the restart_file_tool to set the proper date in the files. The hard part is then coming up with the original collection of model state(s).

### Adding noise to a single model state

This method works well for some models, and fails miserably for others. As it stands, DART supplies a routine that can add gaussian noise to every element of a state vector. This can cause some models to be numerically unstable. You can supply your own model_mod:pert_model_state() if you want a more sophisticated perturbation scheme.

### Using a collection of model states.

The important thing to remember is that the high-order models all come with routines to convert a single model restart file (or the equivalent) to a DART initial conditions file. CAM has cam_to_dart, WRF has wrf_to_dart, POP has pop_to_dart, etc. DART has the ability to read a single file that contains initial conditions for all the ensemble members, or a series of restart files - one for each ensemble member. Simply collect your ensemble of restart files from your model and convert each of them to a DART initial conditions file of the form filter_ics.#### where #### represents a 4 digit ensemble member counter. That is, for a 50-member ensemble, they should be named: filter_ics.0001  ... filter_ics.0050

Frequently, the initial ensemble of restart files is some climatological collection. For CAM experiments, we usually start with N different 'January 1' states ... from N different years. The DART utility program restart_file_tool is then run on each of these initial conditions files to set a consistent date for all of the initial conditions. Experience has shown that it takes less than a week of assimilating 4x/day to achieve a steady ensemble spread. WRF has its own method of generating an initial ensemble. For that, it is best to go to contact someone familiar with WRF/DART.

### Initial conditions for the low-order models.

In general, there are 'restart files' for the low-order models that already exist as work/filter_ics. If you need more ensemble members than are supplied by these files, you can generate your own by adding noise to a single perfect_ics file. Simply specify

&filter_nml
start_from_restart   = .FALSE.,
restart_in_file_name = "perfect_ics",
ens_size             = [whatever you want]


[top]

# 'Perfect Model' observation experiments (also known as) Observation System Simulation Experiment (OSSE)

Once a model is compatible with the DART facility all of the functionality of DART is available. This includes 'perfect model' experiments (also called Observing System Simulation Experiments - OSSEs). Essentially, the model is run forward from a known state and, at predefined times, an observation forward operator is applied to the model state to harvest synthetic observations. This model trajectory is known as the 'true state'. The synthetic observations are then used in an assimilation experiment. The assimilation performance can then be evaluated precisely because the true state (of the model) is known. Since the same forward operator is used to harvest the synthetic observations as well as during the assimilation, the 'representativeness' error of the assimilation system is not an issue.

There are a set of Matlab® functions to help explore the assimilation performance in state-space as well as in observation-space. An OSSE is explored in depth in our Lorenz '96 example.

## Perfect Model Experiment Overview

There are four fundamental steps to running an OSSE from within DART:

1. Create a blueprint of what, where, and when you want observations. Essentially, define the metadata of the observations without actually specifying the observation values. The default filename for the blueprint is obs_seq.in . For simple cases, this is just running create_obs_sequence and create_fixed_network_sequence, more in-depth solutions are presented below.
2. Harvest the synthetic observations from the true model state by running perfect_model_obs to advance the model from a known initial condition and apply the forward observation operator based on the observation 'blueprint'. The observation will have noise added to it based on a draw from a random normal distribution with the variance specified in the observation blueprint. The noise-free 'truth' and the noisy 'observation' are recorded in the output observation sequence file. The entire time-history of the true state of the model is recorded in True_State.nc . The default filename for the 'observations' is obs_seq.out .
3. Assimilate the synthetic observations with filter in the usual way. The prior/forecast states are preserved in Prior_Diag.nc and the posterior/analysis states are preserved in Posterior_Diag.nc . The default filename for the file with the observations and (optionally) the ensemble estimates of the observations is obs_seq.final .
4. Check to make sure the assimilation was effective! Ensemble DA is not a black box! YOU must check to make sure you are making effective use of the information in the observations!

## 1. Defining the observation metadata - the 'blueprint'.

There are lots of ways to define an observation sequence that DART can use as input for a perfect model experiment. If you have observations in DART format already, you can simply use them. If you have observations in one of the formats already supported by the DART converters (check DART/observations/observations.html), convert it to a DART observation sequence. You may need to use the obs_sequence_tool to combine multiple observation sequence files into observation sequence files for the perfect model experiment. Any existing observation values and quality control information will be ignored by perfect_model_obs - only the time and location information are used. In fact, any and all existing observation and QC values will be removed.

GENERAL COMMENT ABOUT THE INTERPLAY BETWEEN THE MODEL STOP/START FREQUENCY AND THE IMPACT ON THE OBSERVATION FREQUENCY: There is usually a very real difference between the dynamical timestep of the model and when it is safe to stop and restart the model. The assimilation window is (usually) required to be a multiple of the safe stop/start frequency. For example, an atmospheric model may have a dynamical timestep of a few seconds, but may be constrained such that it is only possible to stop/restart every hour. In this case, the assimilation window is a multiple of 3600 seconds. Trying to get observations at a finer timescale is not possible, we only have access to the model state when the model stops.

If you do not have an input observation sequence, it is simple to create one.

1. Run create_obs_sequence to generate the blueprint for the types of observations and observation error variances for whatever locations are desired.
2. Run create_fixed_network_seq to define the temporal distribution of the desired observations.

Both create_obs_sequence and create_fixed_network_seq interactively prompt you for the information they require. This can be quite tedious if you want a spatially dense set of observations. People have been known to actually write programs to generate the input to create_obs_sequence and simply pipe or redirect the information into the program. There are several examples of these in the models/bgrid_solo directory: column_rand.f90, id_set_def_stdin.f90, ps_id_stdin.f90, and ps_rand_local.f90 . Be advised that some observation types have different input requirements, so a 'one size fits all' program is a waste of time.

NOTE: only the observation kinds in the input.nml &obs_kind_nml:assimilate_these_obs_types,evaluate_these_obs will be available to the create_obs_sequence program.

DEVELOPERS TIP: You can specify 'identity' observations as input to perfect_model_obs. Identity observations are the model values AT the exact gridcell location, there is no interpolation at all. Just a straight table-lookup. This can be useful as you develop your model interfaces; you can test many of the routines and scripts without having a working model_interpolate().

More information about creating observation sequence files for OSSE's is available in the observation sequence discussion section.

## 2. Generating the true state and harvesting the observation values - perfect_model_obs

perfect_model_obs reads the blueprint and an initial state and applies the appropriate forward observation operator for each and every observation in the current 'assimilation window'. If necessary, the model is advanced until the next set of observations is desired. When it has run out of observations or reached the stop time defined by the namelist control, the program stops and writes out a restart file, a diagnostic file, the observation sequence file, and a log file. This is fundamentally a single deterministic forecast for 'as long as it takes' to harvest all the observations.

default filename format contents
perfect_restart   ASCII or binary  The DART model state at the end of the forecast. If the forecast needs to be lengthened, use this as the input. The format of the file is controlled by input.nml &assim_model_nml:write_binary_restart_files The first record is the valid time of the model. The rest is the model state at that time.
True_State.nc netCDF The DART model state at every assimilation timestep. This file has but one 'copy' - the truth. Dump the copy metadata and the time:
obs_seq.out ASCII or binary
This file has the observations - the result of the forward observation operator. This observation sequence file has two 'copies' of the observation: the noisy 'copy' and the noise-free 'copy'. The noisy copy is designated as the 'observation', the noise-free copy is the truth. The observation-space diagnostic program obs_diag has special options for using the true copy instead of the observation copy. See the obs_diag.html for details.
dart_log.out ASCII The run-time output of perfect_model_obs .

Each model may define the assimilation window differently, but conceptually, all the observations plus or minus half the assimilation window are considered to be simultaneous and a single model state provides the basis for all those observations. For example: if the blueprint requires temperature observations every 30 seconds, the initial model time is noon (12:00) and the assimilation window is 1 hour; all the observations from 11:30 to 12:30 will use the same state as input for the forward observation operator. The fact that you have a blueprint for observations every 30 seconds means a lot of those observations may have the same value (if they are in the same location).

perfect_model_obs uses the input.nml for its control. A subset of the namelists and variables of particular interest for perfect_model_obs are summarized here. Each namelist is fully described by the corresponding module document.

&perfect_model_obs_nml  <--- link to the full namelist description!
...
start_from_restart    = .true.            usually, but not always
output_restart        = .true.            sure, why not
init_time_days        = -1                negative means use the time in ...
init_time_seconds     = -1                the 'restart_in_file_name' file
first_obs_days        = -1                negative means start at the first time in ...
first_obs_seconds     = -1                the 'obs_seq_in_file_name' file.
last_obs_days         = -1                negative means to stop with the last ...
last_obs_seconds      = -1                observation in the file.
restart_in_file_name  = "perfect_ics"
restart_out_file_name = "perfect_restart"
obs_seq_in_file_name  = "obs_seq.in"
obs_seq_out_file_name = "obs_seq.out"
output_interval       = 1
async                 = 0                 totally depends on the model
/

&obs_sequence_nml
write_binary_obs_sequence = .false.       .false. will create ASCII - easy to check.
/

&obs_kind_nml
...
...                                       list all the synthetic observation
...                                       types you want
/

&assim_model_nml
...
/

&model_nml
...
time_step_days = 0,                       some models call this 'assimilation_period_days'
time_step_seconds = 3600                  some models call this 'assimilation_period_seconds'
use whatever value you want
/

&utilities_nml
...
/


## Executing perfect_model_obs

Since perfect_model_obs generally requires advancing the model, and the model may use MPI or require special ancillary files or forcing files or ..., it is not possible to provide a single example that will cover all possibilities. The subroutine-callable models (i.e. the low-order models) can run perfect_model_obs very simply:

./perfect_model_obs

## 3. Performing the assimilation experiment - filter

This step is done with the program filter, which also uses input.nml for input and run-time control. A successful assimilation will depend on many things: an approprite initial ensemble, monitoring and perhaps correcting the ensemble spread, localization, etc. It is simply not possible to design a one-size-fits-all system that will work for all cases. It is critically important to analyze the results of the assimilation and explore ways of making the assimilation more effective. The DART tutorial and the DART_LAB exercises are an invaluable resource to learn and understand how to determine the effectiveness of, and improve upon, an assimilation experiment. The concepts learned with the low-order models are directly applicable to the most complicated models.

It is important to remember that if filter 'terminates normally', it does not necessarily mean the assimilation was effective!

filter produces two state-space output diagnostic files (Prior_Diag.nc and Posterior_Diag.nc) which contains values of the ensemble mean, ensemble spread, perhaps the inflation values, and (optionally) ensemble members for the duration of the experiment. filter also creates an observation sequence file that contains the input observation information as well as the prior and posterior ensemble mean estimates of that observation, the prior and posterior ensemble spread for that observation, and (optionally), the actual prior and posterior ensemble estimates of that observation. Rather than replicate the observation metadata for each of these, the single metadata is shared for all these 'copies' of the observation. See An overview of the observation sequence for more detail. filter also produces a run-time log file that can greatly aid in determining what went wrong if the program terminates abnormally.

A very short description of some of the most important namelist variables is presented here. Basically, I am only discussing the settings necessary to get filter to run. I can guarantee these settings WILL NOT generate the BEST assimilation. Again, see the module documentation for a full description of each namelist.

&filter_nml  <--- link to the full namelist description!
async                    = 0
ens_size                 = 40                 something ≥ 20, please
start_from_restart       = .false.            .false. requires reading available input files
output_restart           = .true.
obs_sequence_in_name     = "obs_seq.out"      whatever you called the output from perfect_model_obs
obs_sequence_out_name    = "obs_seq.final"
restart_in_file_name     = "filter_ics"       the file (or base file name) of your ensemble
restart_out_file_name    = "filter_restart"
init_time_days           = -1                 the time in the restart file is correct
init_time_seconds        = -1
first_obs_days           = -1                 same interpretation as with perfect_model_obs
first_obs_seconds        = -1
last_obs_days            = -1                 same interpretation as with perfect_model_obs
last_obs_seconds         = -1
num_output_state_members = 10                 # of FULL DART model states to put in state-space output files
num_output_obs_members   = 40                 # of ensemble member 'copies' of observation to save
output_interval          = 1
num_groups               = 1
input_qc_threshold       =  4.0
outlier_threshold        =  3.0               Observation rejection criterion!
output_forward_op_errors = .false.
output_timestamps        = .false.
output_inflation         = .true.

inf_flavor               = 0,                       0                  0 is 'do not inflate'
inf_start_from_restart   = .false.,                 .false.
inf_output_restart       = .false.,                 .false.
inf_deterministic        = .true.,                  .true.
inf_in_file_name         = 'not_initialized',       'not_initialized'
inf_out_file_name        = 'not_initialized',       'not_initialized'
inf_diag_file_name       = 'not_initialized',       'not_initialized'
inf_initial              = 1.0,                     1.0
inf_sd_initial           = 0.6,                     0.0
inf_damping              = 0.9,                     0.0
inf_lower_bound          = 1.0,                     1.0
inf_upper_bound          = 1000000.0,               1000000.0
inf_sd_lower_bound       = 0.6,                     0.0
/

&ensemble_manager_nml
single_restart_file_in  = .false.       .false. means each enemble member is in a separate file
single_restart_file_out = .false.
perturbation_amplitude  = 0.2           not used if 'single_restart_file_in' is .false.
/

&assim_tools_nml
filter_kind             = 1             1 is EAKF, 2 is EnKF ...
cutoff                  = 0.2           this is your localization - units depend on type of 'location_mod'
/

&obs_kind_nml
assimilate_these_obs_types = 'RAW_STATE_VARIABLE'    Again, use a list ...
/

&model_nml
assimilation_perior_days    = 0                      the assimilation interval is up to you
assimilation_perior_seconds = 3600
/


num_output_state_members are '.true.' so the state vector is output at every time for which there are observations (once a day here). Posterior_Diag.nc and Prior_Diag.nc then contain values for 20 ensemble members once a day. Once the namelist is set, execute filter to integrate the ensemble forward for 24,000 steps with the final ensemble state written to the filter_restart. Copy the perfect_model_obs restart file perfect_restart (the true state') to perfect_ics, and the filter restart file filter_restart to filter_ics so that future assimilation experiments can be initialized from these spun-up states.

mpirun ./filter        -OR-

mpirun.lsf ./filter    -OR-

./filter               -OR-

however YOU run filter on your system!


## 4. ASSESS THE PERFORMANCE!

All the concepts of spread, rmse, rank histograms that were taught in the DART tutorial and in DART_LAB should be applied now. Try the techniques described in the Did my experiment work? section. The 'big three' state-space diagnostics are repeated here because they are so important. The first two require the True_State.nc .

 plot_bins.m plots the rank histograms for a set of state variables. This requires you to have all or most of the ensemble members available in the Prior_Diag.nc or Posterior_Diag.nc files. plot_total_err.m plots the evolution of the error (un-normalized) and ensemble spread of all state variables. plot_ens_mean_time_series.m plots the evolution of a set of state variables - just the ensemble mean (and Truth, if available). plot_ens_time_series.m is actually a better choice if you can afford to write all/most of the ensemble members to the Prior_Diag.nc and Posterior_Diag.nc files.

## DON'T FORGET ABOUT THE OBSERVATION-SPACE DIAGNOSTICS!

[top]

### An overview of the DART 'preprocess' program

First and foremost, check out the DART/preprocess/preprocess.html document for detailed information.

The preprocess program actually builds source code to be used by all the remaining modules. It is imperative to actually run preprocess before building any executables. This is how the same code can assimilate state vector 'observations' for the Lorenz_63 model and real radar reflectivities for WRF without needing to specify a set of radar operators for the Lorenz_63 model!

preprocess combines multiple 'obs_def' modules into one obs_def_mod.f90 that is then used by the rest of DART. Additionally, a new obs_kind_mod.f90 is built that will provide support for associating the specific observation TYPES with corresponding (generic) observation KINDS. More on that later. The list of source codes is contained in the &preprocess_nml namelist and they ultimately determine what observations and operators are supported. If you want to add another 'obs_def' module, you must rerun preprocess and recompile the rest of your project. preprocess is designed to abort if the files it is supposed to build already exist. For this reason, it is necessary to remove a couple files (if they exist) before you run the preprocessor. It is just a good habit to develop.

\rm -f ../../../obs_def/obs_def_mod.f90
\rm -f ../../../obs_kind/obs_kind_mod.f90
./preprocess
ls -l ../../../obs_def/obs_def_mod.f90
ls -l ../../../obs_kind/obs_kind_mod.f90

For example, with a namelist that looks like:

&preprocess_nml
input_obs_kind_mod_file = '../../../obs_kind/DEFAULT_obs_kind_mod.F90',
output_obs_kind_mod_file = '../../../obs_kind/obs_kind_mod.f90',
input_obs_def_mod_file = '../../../obs_def/DEFAULT_obs_def_mod.F90',
output_obs_def_mod_file = '../../../obs_def/obs_def_mod.f90',
input_files              = '../../../obs_def/obs_def_gps_mod.f90',
'../../../obs_def/obs_def_QuikSCAT_mod.f90',
'../../../obs_def/obs_def_GWD_mod.f90',
'../../../obs_def/obs_def_altimeter_mod.f90',
'../../../obs_def/obs_def_reanalysis_bufr_mod.f90'
/


preprocess will combine DEFAULT_obs_def_mod.F90, obs_def_gps_mod.f90, obs_def_QuikSCAT_mod.f90, obs_def_GWD_mod.f90, obs_def_altimeter_mod.f90, and obs_def_reanalysis_bufr_mod.f90, into obs_def_mod.f90 - which can be used by the rest of the project.

### Building and Running 'preprocess'

preprocess is an executable, so it should come as no surprise that it must be built in the normal DART fashion. The DART/mkmf/mkmf.template must be correct for your environment, and the input.nml must have your desired preprocess_nml set correctly. Given that ...

csh mkmf_preprocess
make
./preprocess

will build and run preprocess.

The first command generates an appropriate Makefile and the input.nml.preprocess_default file. The second command results in the compilation of a series of Fortran90 modules which ultimately produces an executable file: preprocess. The third command actually runs preprocess - which builds the new obs_kind_mod.f90 and obs_def_mod.f90 source code files. The rest of DART may now be built.

### A little background for 'preprocess'

IMPORTANT: Since each 'observation kind' may require different amounts of metadata to be read or written; any routine to read or write an observation sequence must be compiled with support for those particular observations. The supported observations are listed in the input.nml&obs_kind_nml block. This is the whole point of the 'preprocess' process ...

[top]

### An overview of the observation sequence

Observation sequences are complicated, there's just no better way to describe it. Trying to automatically accomodate a myriad of observation file formats, structure, and metadata is simply not an easy task. For this reason, DART has its own format for observations and a set of programs to convert observations from their original formats to DART's format. There are definitely some things to know ...

An obs_seq.in file actually contains no observation quantities. It may be best thought of as a perfectly-laid-out notebook - just waiting for an observer to fill in the actual observation quantities. All the rows and columns are ready, labelled, and repeated for every observation time and platform. This file is generally the start of a "perfect model" experiment. Essentially, one instance of the model is run through perfect_model_obs - which applies the appropriate forward operators to the model state and 'writes them down' in our notebook. The completed notebook is renamed obs_seq.out.

An obs_seq.out file contains a linked list of observations - potentially (and usually) observations from different platforms and of different quantities - each with their own error characteristics and metadata. These files arise from running perfect_model_obs OR from any number of converter programs. The creation of observation sequences from real observations is not automatic and an email to the DART team asking for advice for your specific types of observations is perfectly within reason.

There is something called an obs_seq.final file - which contains everything in the obs_seq.out file as well as a few additional 'copies' of the observation. Remember, DART is an ensemble algorithm. Each ensemble member must compute its own estimate of the observation for the algorithm. The obs_seq.final file may contain each of these estimates (namelist controlled). Minimally, the mean and spread of the ensemble estimates is recorded in the obs_seq.final file. The best method of determining the performance of your 'real world' experiment is to compare in observation-space since we can never know the model state that perfectly represents the real world.

IMPORTANT: Since each 'observation kind' may require different amounts of metadata to be read or written; any routine to read or write an observation sequence must be compiled with support for those particular observations. The supported observations are listed in the input.nml&obs_kind_nml block. This is the whole point of the 'preprocess' process ...

#### observation sequence file structure

explanationobs_seq.outobs_seq.final
There are extensible parts of the observation sequence file; for example, the number of observation kinds contained in the file, whether the locations have 1 or more components, how many quality control values are available for each observation, where those quality control values come from, how many 'copies' of each observation there are ... et cetera. The images to the right are links to full-size images. They are from entirely separate experiments. They are just meant to show the flexibility of the file format.

[top]

## Creating observations and sequences.

It is strongly encouraged that you use a single observation to test a new model implementation.

Experience has shown that starting 'simple' is the fastest way to good results. Starting with a single observation will exercise a sufficient portion of the procedure and provide insight into where to spend more effort. Starting with a single synthetic observation will allow you to focus on the more interesting parts of the DART scheme without getting bogged down in the world of observation data formats.

### Creating a synthetic observation sequence.

There are several steps to create an observation sequence file, which follows directly from the modular nature of the DART programming philosophy.

1. Decide what observations you want to investigate and edit the input.nml&obs_kind_nml block.
2. Build and run preprocess to create code that supports the observations you want.
3. Build and run create_obs_sequence to define the specifics about the observation you want.
4. Build and run create_fixed_network_sequence to replicate those specifics through time.
5. Build and run perfect_model_obs to create an observation consistent with the model state and specified error distribution at the requested times and locations.

#### Example: generating observations for the Lorenz '63 model.

1) There are no 'real' observations for the Lorenz '63 model, so the appropriate namelist settings are:

&obs_kind_nml
assimilate_these_obs_types = 'RAW_STATE_VARIABLE' /

&preprocess_nml
input_obs_def_mod_file = '../../../obs_def/DEFAULT_obs_def_mod.F90',
output_obs_def_mod_file = '../../../obs_def/obs_def_mod.f90',
input_obs_kind_mod_file = '../../../obs_kind/DEFAULT_obs_kind_mod.F90',
output_obs_kind_mod_file = '../../../obs_kind/obs_kind_mod.f90',
input_files = '../../../obs_def/obs_def_1d_state_mod.f90' /

2) Run preprocess in the normal fashion.

3) create_obs_sequence creates an observation set definition (typically named set_def.out), the time-independent part of an observation sequence. It may help to think of it as trying to define what sorts of observations will be taken at one 'reading' ... you walk out to the box and take temperature, humidity, and wind observations all at the same time and place, for example. You can think of it as one page in an observer's notebook, and only contains the location, type, and observational error characteristics (normally just the diagonal observational error variance) for a related set of observations. There are no actual observation values, nor are there any times associated with the definition. The program is interactive and queries the user for the information it needs. Begin by creating a minimal observation set definition in which each of the 3 state variables of L63 is directly observed with an observational error variance of 1.0 for each observation. To do this, use the following input sequence (the text including and after # is a comment and does not need to be entered):

The following is a screenshot (much of the verbose logging has been left off for clarity), the user input looks like this.

   [unixprompt]$./create_obs_sequence Starting program create_obs_sequence Initializing the utilities module. Trying to log to unit 10 Trying to open file dart_log.out Registering module :$url: http://squish/DART/trunk/utilities/utilities_mod.f90 ${ ... } Input upper bound on number of observations in sequence 4 Input number of copies of data (0 for just a definition) 0 Input number of quality control values per field (0 or greater) 0 input a -1 if there are no more obs 0 Registering module :$url: http://squish/DART/trunk/obs_def/DEFAULT_obs_def_mod.F90 ${ ... } initialize_module obs_kind_nml values are -------------- ASSIMILATE_THESE_OBS_TYPES -------------- RAW_STATE_VARIABLE -------------- EVALUATE_THESE_OBS_TYPES -------------- ------------------------------------------------------ Input -1 * state variable index for identity observations OR input the name of the observation kind from table below: OR input the integer index, BUT see documentation... 1 RAW_STATE_VARIABLE -1 input time in days and seconds 0 0 Input error variance for this observation definition 1.0 input a -1 if there are no more obs 0 { this gets repeated ... until you tell it to stop ... } input a -1 if there are no more obs -1 Input filename for sequence ( set_def.out usually works well) set_def.out write_obs_seq opening formatted file set_def.out write_obs_seq closed file set_def.out  Rest assured that if you requested to assimilate more realistic observation types, you will be queried for appropriate information by create_obs_sequence. Below is a table that explains all of the input you should need to supply for observations of the L63 model state.  4 # upper bound on num of observations in sequence 0 # number of copies of data (0 for just a definition) 0 # number of quality control values per field (0 or greater) 0 # -1 to exit/end observation definitions -1 # observe state variable 1 0 0 # time -- days, seconds 1.0 # observational variance 0 # -1 to exit/end observation definitions -2 # observe state variable 2 0 0 # time -- days, seconds 1.0 # observational variance 0 # -1 to exit/end observation definitions -3 # observe state variable 3 0 0 # time -- days, seconds 1.0 # observational variance -1 # -1 to exit/end observation definitions set_def.out # Output file name 4) create_fixed_network_sequence takes the observation set definition and repeats it in time, essentially making multiple pages in our notebook. Again, the program is interactive and queries the user for information. You should be able to simply follow the prompts. The table below represents the input needed for the L63 example:  set_def.out # Input observation set definition file 1 # Regular spaced observation interval in time 1000 # 1000 observation times 0, 43200 # First observation after 12 hours (0 days, 12 * 3600 seconds) 0, 43200 # Observations every 12 hours obs_seq.in # Output file for observation sequence definition 5) perfect_model_obs advances the model from the state defined by the initial conditions file specified in the input.nml and 'applies the forward operator' to harvest observations to fill in the observation sequence specified in obs_seq.in. The observation sequence finally has values for the observations and is saved in a file generally named obs_seq.out. perfect_model_obs is namelist-driven, as opposed to the previous two (whose input is a lot harder to specify in a namelist). Take a look at (and modify if you like) the input.nml&perfect_model_obs_nml section of the namelist. The End. Not only should you have an observation sequence file (usually obs_seq.out) , you also have a file containing the exact evolution of the model consistent with those observations - the True_State.nc. [top] ### Real Observations - Converting to a DART-compatible format. Real observations come in a mind-boggling diversity of formats. We have converters for some formats in the DART/observations directory. Many of the formats require their own libraries (like HDF), and require intimate knowledge of the data format to extract the portions required for the DART observation sequence file. Please feel free to browse the converters and their companion documentation. Feel free to donate converters for formats we don't already support! We like that kind of stuff. The DART framework enforces a clean separation between observations and the models used for assimilation. The same observations can be used in any model which understands how to generate a value for the requested type of observation from the models' state-space values (i.e. the forward observation operator must exist - DART provides many for the most common state variables). In many cases, the original datasets are in a standard scientific format like netCDF, HDF, or BUFR, and library routines for those formats can be used to read in the original observation data. The DART software distribution includes Fortran subroutines and functions to help create a sequence of observations in memory, and then a call to the DART observation sequence write routine will create an entire obs_seq file in the correct format. In many cases, a single, self-contained program can convert directly from the observation location, time, value, and error into the DART format. In other cases, especially those linking with a complicated external library (e.g. BUFR), there is a two-step process with two programs and an ASCII intermediate file. We are currently leaning towards single-step conversions but either approach can be used for new programs. The DART system comes with several types of location modules for computing distances appropriately. The two most commonly used are for data in a 1D system and for data in a 3D spherical coordinate system. All the programs in the DART/observations directory assume the location/threed_sphere/location_mod.f90 3D sphere location module is being used. With the myriad of observation file formats, HDF, Grib, BUFR, netCDF, ... we simply have not had the time nor need to support all of them. The converters are a work in progress. There are currently about 10 other observation sources and types which we are in the process of collecting information and conversion programs for and which will eventually be added to this directory. In the meantime, if you have converters for data or interest in something that is not in the repository, please email the DART group. Your best bet is to contact our group at dart@ucar.edu with a specific request and we can steer you to the most similar process. [top] ### Manipulating observation sequences. First and foremost, check out the DART/obs_sequence/obs_sequence_tool.html document for detailed information and examples. obs_sequence_tool is the primary tool for manipulating observation sequence files. Observations sequence files are linked lists of observations organized by time. That is to say, the observations may appear in any order in the file, but traversing the linked list will result in observations ordered by time. There are tools for querying the linked list to extract the 'keys' to the list items for a particular date range, for example. obs_sequence_tool can be used to combine observation sequences, convert from ASCII to binary or vice-versa, extract a subset of observations, etc. For testing, it is terribly useful to extract a small number of observations (like ONE) from an existing observation sequence file. [top] ### The difference between observation TYPE and observation KIND. Broadly speaking, observation TYPES are specific instances of a generic observation KIND. The distinction is useful for several reasons, not the least of which is to evaluate observation platforms. Zonal wind observations from QuikSCAT vs. radiosondes, for example. They are both observations of zonal winds (what we call KIND_U_WIND_COMPONENT), but they are different observation TYPES; QKSWND_U_WIND_COMPONENT, and RADIOSONDE_U_WIND_COMPONENT, respectively. The forward observation operators are implemented based on observation KIND. When requested, the model generates a KIND_U_WIND_COMPONENT, it doesn't need to know that it will be compared to a QuikSCAT value or a radiosonde value. However, it is usually scientifically very interesting to be able to compare the assimilations one TYPE of observation vs. another. One observation sequence file can have lots of types of observations; DART has the capability to assimilate (or evaluate) any combination of observation types without getting bogged down in dataset management. The same observation sequence can be used for experiments that include/exclude certain observation types - ensuring that you are performing the experiment you THINK you are performing ... ## Adding support for a new observation TYPE. DART/obs_def/obs_def_mod.html is the source for detailed information. [top] ## The Data Assimilation Research Testbed - DART Tutorial DART comes with an extensive set of tutorial materials, working models of several different levels of complexity, and data to be assimilated. It has been used in several multi-day workshops and can be used as the basis to teach a section on Data Assimilation. Download the DART software distribution and look in the tutorial subdirectory for the pdf and framemaker source for each of the 22 tutorial sections. The most recent versions of the tutorial are always provided below. Browsing the tutorial is worth the effort. Taking the tutorial is FAR better! 1. Section 1 [pdf] Filtering For a One Variable System. 2. Section 2 [pdf] The DART Directory Tree. 3. Section 3 [pdf] DART Runtime Control and Documentation. 4. Section 4 [pdf] How should observations of a state variable impact an unobserved state variable? Multivariate assimilation. 5. Section 5 [pdf] Comprehensive Filtering Theory: Non-Identity Observations and the Joint Phase Space. 6. Section 6 [pdf] Other Updates for An Observed Variable. 7. Section 7 [pdf] Some Additional Low-Order Models. 8. Section 8 [pdf] Dealing with Sampling Error. 9. Section 9 [pdf] More on Dealing with Error; Inflation. 10. Section 10 [pdf] Regression and Non-linear Effects. 11. Section 11 [pdf] Creating DART Executables. 12. Section 12 [pdf] Adaptive Inflation. 13. Section 13 [pdf] Hierarchical Group Filters and Localization. 14. Section 14 [pdf] DART Observation Quality Control. 15. Section 15 [pdf] DART Experiments: Control and Design. 16. Section 16 [pdf] Diagnostic Output. 17. Section 17 [pdf] Creating Observation Sequences. 18. Section 18 [pdf] Lost in Phase Space: The Challenge of Not Knowing the Truth. 19. Section 19 [pdf] DART-Compliant Models and Making Models Compliant. 20. Section 20 [pdf] Model Parameter Estimation. 21. Section 21 [pdf] Observation Types and Observing System Design. 22. Section 22 [pdf] Parallel Algorithm Implementation. 23. Carbon Tutorial [pdf] A Simple 1D Advection Model. [top] # Adding a model to DART - Overview DART is designed to work with many models without modifications to the DART routines or the model source code. DART can 'wrap around' your model in two ways. One can be used if your model can be called as a subroutine, the other is for models that are separate executables. Either way, there are some steps that are common to both paths. Please be aware that several of the high-order models (CAM and WRF, in particular) have been used for years and their scripts have incorporated failsafe procedures and restart capabilities that have proven to be useful but make the scripts complex - more complex than need be for the initial attempts. Truly, some of the complexity is no longer required for available platforms. Then again, we're not running one instance of a highly complicated computer model, we're running N of them. ## The basic steps to include your model in DART 1. Copy the template directory and files to your own DART model directory. 2. Modify the model_mod.f90 file to return specifics about your model. This module MUST contain all the required interfaces (no surprise) but it can also contain many more interfaces as is convenient. 3. [optional step] Modify the Matlab® routines to know about the specifics of the netCDF files produces by your model (sensible defaults, for the most part.) If your model is not subroutine-callable, there is extra work to be done. There are several examples to raid, but it helps to know which existing model has a strategy that is most similar to yours. More on that later. 4. Modify shell_scripts/advance_model.csh to: collect all the input files needed to advance the model into a clean, temporary directory, convert the state vector file into input to your model, run your model, and convert your model output to the expected format for another assimilation by DART. We have examples - some that use the following support routines. 1. Create a routine or set of routines to take a DART array and create input files for your model. This is frequently done by a program called dart_to_model.f90, but you can do it any way you like. It is strongly suggested that you use the DART read mechanism for the restart file - namely assim_model_mod.f90:aread_state_restart() 2. Modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. It may be convenient to do this in dart_to_model.f90. 3. Run the model (you may need to watch the MPI syntax) 4. Create a routine or set of routines to take your model output files and create a DART restart file. This is frequently done by a program called model_to_dart.f90. It is strongly suggested that you use the DART write mechanism for the restart file - namely assim_model_mod.f90:awrite_state_restart() If a single instance of your model needs to advance using all the MPI tasks, there is one more script that needs to work - run_filter.csh. 5. Modify shell_scripts/run_filter.csh to: do everything under the sun and then some ## Test ... Generally, it is a good strategy to use DART to create a synthetic observation sequence with ONE observation location - and ONE observation type - for several assimilation periods. With that, it is possible to run perfect_model_obs and then filter without having to debug too much stuff at once. A separate document will address how to test your model with DART. ## Programming style #1 Don't shoot the messenger. We have a lot of experience trying to write portable/reproducible code and offer these suggestions. All of these suggestions are for the standalone DART components. We are not asking you to rewrite your model. If your model is a separate executable, leaving it untouched is fine. Writing portable code for the DART components will allow us to include your model in the nightly builds and reduces the risk of us making changes that adversely affect the integration with your model. There are some routines that have to play with the core DART routines, these are the ones we are asking you to write using these few simple guidelines. • Use explicit typing, do not throw the 'autopromote' flag on your compiler. • Use the intent() attribute. • Use the use, xxx_mod, only : bob, sally statements for routines from other modules. This really helps us track down things and ensures you're using what you think you're using. • Use Fortran namelists for I/O if possible. • Check out the existing parameters/routines in common/types_mod.f90, utilites/utilities_mod.f90, and time_manager/time_manager_mod.f90. You are free to use these and are encouraged to do so. No point reinventing the wheel and these routines have been tested extensively. Hopefully, you have no idea how difficult it is to build each model with 'unique' compile options on N different platforms. Fortran90 provides a nice mechanism to specify the type of variable, please do not use vendor-specific extensions. (To globally autopromote 32bit reals to 64bit reals, for example. That is a horrible thing to do, since vendors are not consistent about what happens to explicitly-typed variables. Trust me. They lie. It also defeats the generic procedure interfaces that are designed to use a single interface as a front-end to multiple 'type-specific' routines.) Compilers do abide by the standard, however, so DART code looks like: character(len=8) :: crdate integer, dimension(8) :: values ... real(r4) :: a,b real(r8) :: bob integer :: istatus, itype ... real(r8), intent(in) :: x(:) type(location_type), intent(in) :: location integer, intent(in) :: itype integer, intent(out) :: istatus real(r8), intent(out) :: obs_val depending on the use. The r4 and r8 types are explicitly defined in DART/common/types_mod.f90 to accurately represent what we have come to expect from 32bit and 64bit floating point real variables, respectively. If you like, you can redefine r8 to be the same as r4 to shrink your memory requirement. The people who run with WRF frequently do this. Do not redefine the digits12 parameter, that one must provide 64bit precision, and is used in precious few places. # Adding a model to DART - Specifics If your model is a separate executable, there is some flexibility to provide the required interfaces and it would be wise to look at the heavily commented template script DART/models/templates/shell_scripts/advance_model.csh and then a few higher-order models to see how they do it. Become familiar with DART/doc/html/mpi_intro.html (DART's use of MPI), DART/doc/html/filter_async_modes.html, and the filter namelist parameter async in filter.html. ## 1. Copying the template directory A little explanation/motivation is warranted. If the model uses the standard layout, it is much easier to include the model in the nightly builds and testing. For this reason alone, please try to use the recommended directory layout. Simply looking at the DART/models directory should give you a pretty good idea of how things should be laid out. Copy the template directory and its contents. It is best to remove the (hidden) subversion files to keep the directory 'clean'. The point of copying this directory is to get a model_mod.f90 that works as-is and you can modify/debug the routines one at a time. ~/DART/models % cp -r template mymodel ~/DART/models % find mymodel -name .svn -print mymodel/.svn mymodel/shell_scripts/.svn mymodel/work/.svn ~/DART/models % rm -rf find mymodel -name .svn -print ~/DART/models % find mymodel -name .svn -print ~/DART/models % The destination directory (your model directory) should be in the DART/models directory to keep life simple. Moving them around will cause problems for the work/mkmf_xxxxx configuration files. Each model directory should have a work and shell_scripts directories, and may have a matlab directory, a src directory, or anything else you may find convenient. Now, you must change all the work/path_names_xxx file contents to reflect the location of your model_mod.f90. ## 2. model_mod.f90 We have templates, examples, and a document describing the required interfaces in the DART code tree - DART/models/model_mod.html. Every(?) user-visible DART program/module has a matching piece of documentation that is distributed along with the code. The DART code tree always has the most current documentation. Check out time_manager_mod.f90 and utilities_mod.f90 for general-purpose routines ... Use Fortran namelists for I/O if possible. Modify the model_mod.f90 file to return specifics about your model. This module MUST contain all the required interfaces (no surprise) but it can also contain many more interfaces as is convenient. This module should be written with the understanding that print statements and error terminations will be executed by multiple processors/tasks. To restrict print statements to be written once (by the master task), it is necessary to preface the print as in this example: if (do_output()) write(*,*)'model_mod:namelist cal_NML',startDate_1,startDate_2 ### Required Interfaces in model_mod.f90 No matter the complexity of the model, the DART software requires a few interface routines in a model-specific Fortran90 module model_mod.f90 file. The models/template/model_mod.f90 file has extended comment blocks at the heads of each of these routines that go into much more detail for what is to be provided. You cannot change the types or number of required arguments to any of the required interface routines. You can add optional arguments, but you cannot go back throught the DART tree to change the gazillion calls to the mandatory routines. It is absolutely appropriate to look at existing models to get ideas about how to implement the interfaces. Finding a model implementation that is functionally close to yours always helps. As of December 2008, the table of the mandatory interfaces and programming degree-of-difficulty is: subroutine callable separate executable routine description easyeasy get_model_size This function returns the size of all the model variables (prognostic or diagnosed or ...) that are packed into the 1D DART state vector. That is, it returns the length of the DART state vector as a single scalar integer. dependstrivial adv_1step For subroutine-callable models, this routine is the one to actually advance the model 1 timestep (see models/bgrid_solo/model_mod.f90 for an example). For non-subroutine-callable models, this is a NULL interface. Easy. dependsdepends get_state_meta_data This routine takes as input an integer into the DART state vector and returns the associated location and (optionally) variable type from obs_kind/obs_kind_mod.f90. (See models/*/model_mod.f90 for examples.) This generally requires knowledge of how the model state vector is packed into the DART array, so it can be as complicated as the packing. dependshard model_interpolate This is one of the more difficult routines. Given a DART state vector, a location, and a desired generic 'kind' (like KIND_SURFACE_PRESSURE, KIND_TEMPERATURE, KIND_SPECIFIC_HUMIDITY, KIND_PRESSURE, ... ); return the desired scalar quantity and set the return status accordingly. This is what enables the model to use observation-specific 'forward operators' that are part of the common DART code. easyeasy get_model_time_step This routine returns the smallest increment in time (in seconds) that the model is capable of advancing the state in a given implementation. For example, the dynamical timestep of a model is 20 minutes, but there are reasons you don't want to (or cannot) restart at this interval and would like to restart AT MOST every 6 hours. For this case, get_model_time_step should return 21600, ie 6*60*60. This is also interpreted as the nominal assimilation period. This interface is required for all applications. easyeasy end_model Performs any shutdown and cleanup needed. Good form would dictate that you should deallocate any storage allocated when you instantiated the model (from static_init_model, for example). dependsdepends static_init_model Called to do one-time initialization of the model. This generally includes setting the grid information, calendar, etc. trivialtrivial init_time Returns a time that is somehow appropriate for starting up a long integration of the model IFF the namelist parameter start_from_restart = .false. for the program perfect_model_obs. If this option is not to be used in perfect_model_obs, this can be a NULL interface. easyeasy init_conditions Companion interface to init_time. Returns a model state vector that is somehow appropriate for starting up a long integration of the model. Only needed IFF the namelist parameter start_from_restart = .false. for the program perfect_model_obs. trivial-difficult trivial-difficult nc_write_model_atts This routine is used to write the model-specific attributes to the netCDF files containing the prior and posterior states of the assimilation. The subroutine in the models/template/model_mod.f90 WILL WORK for new models but does not know anything about prognostic variables or geometry or ... Still, it is enough to get started without doing anything. More meaningful coordinate variables etc. are needed to supplant the default template. This can be as complicated as you like - see existing models for examples. trivial-difficult trivial-difficult nc_write_model_vars This routine is responsible for writing the DART state vector -or- the prognostic model variables to the output netCDF files. If the namelist parameter output_state_vector == .false. this routine is responsible for partitioning the DART state vector into the appropriate netCDF pieces (i.e. the prognostic model variables). The default routine will simply blast out the entire DART state vector into a netCDF variable called 'state'. dependstrivial-difficult pert_model_state This routine is used to generate initial ensembles. This may be a NULL interface if you can tolerate the default perturbation strategy of adding noise to every state element or if you generate your own ensembles outside the DART framework. There are other ways of generating ensembles ... climatological distributions, bred singular vectors, voodoo ... trivialtrivial get_close_maxdist_init This routine performs the initialization for the table-lookup routines that accelerate the distance calculations. This routine is closely tied to the type of location module used by the model and is frequently (universally?) simply a 'pass-through' routine to a routine of the same name in the location module. There is generally no coding that needs to be done, but the interface must exist in model_mod.f90 trivialtrivial get_close_obs_init This routine performs the initialization for the get_close accelerator that depends on the particular observation. Again, this is generally a 'pass-through' routine to a routine of the same name in the location module. trivialtrivial get_close_obs This is the routine that takes a single location and a list of other locations, returns the indices of all the locations close to the single one along with the number of these and the distances for the close ones. Again, this is generally a 'pass-through' routine to a routine of the same name in the location module. easyeasy ens_mean_for_model This routine simply stores a copy of the ensemble mean of the state vector within the model_mod. The ensemble mean may be needed for some calculations (like converting model sigma levels to the units of the observation - pressure levels, for example). ## 3. providing Matlab® support Since this is an optional step, it will be covered in a separate document. ## If your model is subroutine-callable - you're done! ## 4. advance_model.csh ## The Big Picture for models advanced as separate executables. The normal sequence of events is that DART reads in its own restart file (do not worry about where this comes from right now) and eventually determines it needs to advance the model. DART needs to be able to take its internal representation of each model state vector, the valid time of that state, and the amount of time to advance the state - and communicate that to the model. When the model has advanced the state to the requested time, the output must be ingested by DART and the cycle begins again. DART is entirely responsible for reading the observations and there are several programs for creating and manipulating the observation sequence files. There are a couple of ways to exploit parallel architectures with DART, and these have an immediate bearing on the design of the script(s) that control how the model instances (each model copy) are advanced. Perhaps the conceptually simplest method is when each model instance is advanced by a single processor element. DART calls this async = 2. It is generally efficient to relate the ensemble size to the number of processors being used. The alternative is to advance every model instance one after another using all available processors for each instance of the model. DART calls this async = 4, and requires an additional script. For portability reasons, DART uses the same processor set for both the assimilation and the model advances. For example, if you advance the model with 96 processors, all 96 processors will be employed to assimilate. advance_model.csh is invoked in one of two ways: 1) if async = 2 then filter uses a system() call, or 2) if async = 4 then run_filter.csh makes the call. Either way there are three arguments. 1. the process number of the caller - could be the master task ID (zero) or (especially if async = 2) a process id that gets related to the copy. When multiple copies are being advanced simultaneously, each of the advances happens in its own run-time directory. 2. the number of state copies belonging to that process 3. the name of the (ASCII) filter_control_file for that process. The filter_control file contains the following information (one per line): the ensemble member, the name of the input file (containing the DART state vector), and the name of the output file from the model containing the new DART state vector. For example, 1 assim_model_state_ic.0001 assim_model_state_ud.0001 2 assim_model_state_ic.0002 assim_model_state_ud.0002 ... ### async = 2 ... advancing many copies at the same time Modify shell_scripts/advance_model.csh to: 1. Collect all the input files needed to advance the model into a clean, temporary directory. 2. Determine how many tasks you have, and how many ensemble members you have. Determine how many 'batches' of ensemble members must be done to advance all of them. With 20 tasks and 80 ensemble members, you will need to loop 4 times, for example. clean, temporary directory 3. and loop over the following three steps - each loop advances one ensemble member 4. convert the DART state vector file into input for your model, 5. run your model, and 6. convert your model output to the file with the expected format for another assimilation by DART. During this initial phase, it may be useful to _leave_ the temporary directory intact until you verify everything is as it should be. ### async = 4 ... advancing each copy one at a time In addition to modifying shell_scripts/advance_model.csh as described above, you must also modify shell_scripts/run_filter.csh in the following way: THIS PART NEEDS TO BE FILLED IN ## 5. Converting DART output to input for your model. After DART has assimilated the observations and created new (posterior) states, it is necessary to reformat those posteriors into input for the model. Fundamentally, you are unpacking the DART state vector and putting the pieces back into whatever portion of your model initial conditions file is appropriate. Frequently this is done by a program called dart_to_model.f90. Put another way; Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. Frequently this is done by dart_to_model.f90. The DART array has a header that contains the 'advance-to' time as well as the 'valid' time of the DART array. The times are encoded in DART's time representation. Interpretation of these times by your model will be necessary and you will need to feed these times to your model in some automated fashion. You will also need to create/modify a mkmf_dart_to_model and path_names_dart_to_model specific to your model. ## 6. Modify the start/stop control of your model run. Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. These routines are called in the advance_model.csh script. Every model is controlled differently, so writing detailed descriptions here is pointless. ## 7. Convert your model output to DART input. After your model has advanced its states to the target time, it is necessary to convey the new state information to DART. The preferred name for this is model_to_dart.f90. This is fundamentally the inverse of dart_to_model.f90. Rip out the bits of the state vector you want to paste into a vector for DART, prepend the valid_time in the approved DART format and you're good to go. If you pack the bits into a DART state vector, there are native DART routines to write out the state vector. This ensures that DART will be able to read what you've written, and insulates you from having to worry about any internal file format changes we might make. [top] ## Adding Matlab® support for your own model - under construction. Only needed for state-space diagnostics. Define a structure with required elements. [top] ## Testing Strategies - under construction Generally testing when you add a new model to DART includes: Checking the converter programs. Checking the model advance control. Starting with one observation in a known location, with a known value and error specification. Performing a 'perfect model' experiment for a few timesteps. Looking at what the assimilation did with: ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc ncview Innov.nc [top] ## Did my experiment work? If filter ran and produced output files, now what? #### What output and diagnostic files are produced from executable "perfect_model_obs" obs_seq.out the observations at some predefined times and locations True_State.nc a netCDF file containing the model trajectory perfect_restart the final state of the model - in ASCII from executable "filter" Prior_Diag.nc the model states right before assimilation Posterior_Diag.nc the model states immediately after assimilation obs_seq.final the model estimates of the observations (an integral part of the data assimilation process) filter_restart the ensemble of final model states from both dart_log.out the 'important' run-time output (each run of filter appends to this file; remove it or start at the bottom to see the latest values) dart_log.nml the input parameters used for an experiment ### First questions to ask After filter executes without error and produces an obs_seq.final file, a Prior_Diag.nc file, and a Posterior_Diag.nc file, the first questions to ask are if any observations were successfully assimilated, and is the model state output from filter is different from the input. One way to check if the output model state data was changed by the assimilation is to use the 'ncdiff' tool to difference the Prior and Posterior diagnostic files: ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc ncview Innov.nc Look at the first copy, which is the ensemble mean. If all values in that copy are 0, then the assimilation changed nothing in the state. #### Debugging hints: You may need to rerun filter multiple times to diagnose this. The fastest way to get to the answer is to make filter very fast to run. You can do this by: 1. make an obs_seq file with only 1 or a just a few observations in it, and 2. configure a run so filter does a single assimilation and exits, without having to advance the ensemble of models or do other work. ##### To make an obs file with a single observation, use one of these methods: 1. run ./create_obs_sequence to make a new, short, obs_seq file 2. Use the obs_sequence_tool to cut your existing obs_seq.out file down to just a few obs by selecting only a subset of the types and setting a very short time window (just a second or two where you know there are obs). ##### To make the filter program only do an assimilation: 1. Edit the input.nml and in the &filter_nml namelist set the init_time_days and init_time_seconds to match the observation time in the truncated obs_seq file. This overrides any times in the restart/initial condition files and ensures that filter will only assimilate and not try to advance the model. 2. Make sure the truncated obs_seq file contains only 1 obs, a few obs at the same time, or obs close enough together in time to fit into a single assimilation window. If there are no changes in the model state after assimilation, then examine the obs_seq.final file. If this file is in binary format, change this namelist item so the output observation sequence file will be written in ascii: &obs_sequence_nml write_binary_obs_sequence = .false. /  and rerun filter to regenerate an obs_seq.final file in ascii. See the diagrams on this page for help in understanding how to read the different fields in an obs_seq.final file. If all the prior and posterior mean values are -888888.0 (the DART "missing data" value), those observations were not assimilated. If it is not already set, change &filter_nml :: num_output_obs_members to be the same as the number of ensembles. This will give you all the forward operator values for all the ensemble members. You can determine if all ensemble members are failing in the same way, or if only a few are problematic. For the failing observations, check the 'DART QC' for the reason why. (See here for how to locate the different values in an obs_seq.final file. The 'DART QC' field is usually the second of the 2 "quality control" copies.) • If the DART QC values are 4, the forward operators have failed. Look at the model_interpolate() routine in your model_mod.f90 file, or the forward operator code in obs_def/obs_def_xxx_mod.f90 for your observation type. A successful forward operator must return a valid obs_val and an istatus=0. If the forward operator code returns different istatus values for different error types, you can set &filter_nml :: output_forward_op_errors = .true. and rerun filter to see exactly what error istatus codes are being set. See this web page for more information on how to use the 'output_forward_op_errors' option. Negative istatus values are reserved for the system, istatus = 0 is success, and any positive value indicates a failed forward operator. The code is free to use different positive values to signal different types of errors. • If the DART QC values are 5, those observation types were intentionally ignored because they were not listed in the &obs_kind_nml namelist, in the 'assimilate_these_obs_types' stringlist. • If the DART QC values are 6, the data quality control that came with the original observation data indicates this is a bad quality observation and it was skipped for this reason. • If the DART QC values are 7, the observation value is too far away from the ensemble mean. Set the &filter_nml :: outlier_threshold value to -1 to ignore this for now and rerun. If in the obs_seq.final file the prior and posterior values are not -888888.0 but are identical, your obs are being assimilated but are having no impact. The most common reasons assimilated obs have no impact on the model state include: • Zero spread in ensemble members Your initial ensemble members must have different values for each state item. If all members have identical values, the observations cannot make a change. To diagnose this condition, look at the Prior_Diag.nc file, at copy=2 (the second copy), which is the ensemble spread. If all the values are 0 this is your problem. The fix is to set &filter_nml :: start_from_restart = .false. (which will still require a single filter initial condition file) but then the filter code will add random gaussian perturbations to each state vector item to generate an initial ensemble with spread. The magnitude of the gaussian noise added is controlled by the &ensemble_manager :: perturbation_amplitude. It is also possible to write your own perturbation routine in your model_mod.f90 code. • Cutoff value too small Check the localization (cutoff) radius in the &assim_tools_nml namelist, the 'cutoff' item. Set it to a very large number (e.g. 100000) and rerun. If there is now an impact, this cutoff was restricting the items in the state vector so your obs had no impact before. Cutoff values are dependent on the location type being used. It is specified in radians for the threed_sphere locations module (what most large models use), or in simple distance if using a low order model (lorenz, ikeda, etc). • Obs error values too large (less likely) If the observation error is very large, it will have no impact on the model state. This is less likely a cause than other possibilities. • No correlation (unlikely) If there is no correlation between the distribution of the forward observation values and the state vector values, the increments will be very tiny. However there are generally still tiny increments applied, so this is also a low likelyhood case. • Errors in forward operator location computations, or model get_close_obs() If there is an error in the model_mod.f90 code in either get_state_meta_data(), model_interpolate(), or the vertical conversion code in get_close_obs(), it is possible for the forward operators to appear to be working correctly, but the distances computed for the separation between the obs and the state vector values can be incorrect if the wrong locations are being passed back from get_state_meta_data(). This can result in the increments being applied in the wrong locations or not at all. This is usually one of the things to test carefully when developing a new model interface. • Incorrect vertical conversion If the model is using 3d coordinates and needs the capability to convert between pressure, height, and/or model level, if the conversion is done incorrectly the state vector locations can appear to be too high or too low to be impacted by an observation. Also some models have a height limit built into their model_mod code to avoid trying to assimilate observations at the model top, if the model top is prescribed. The observations cannot make meaningful changes to the model state there and trying to assimilate them can lead to problems with the inflation. If the code in the model_mod is excluding observations incorrectly, or you are testing with observations at the model top, this can result in no impact on the model state. ### How is the output different from the input? If you compute the difference between the prior and posterior diagnostic files by this process: ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc ncview Innov.nc and you see a difference, is it correct? If you run with a single observation, you should be able to easily see the impact - generally it's a mostly spherical or circular ring around the observation location depending on your vertical localization, you may or may not see an impact in the vertical. Using &location_nml::horiz_dist_only=.true. is usually a good idea for a full 3d model to start out, and then add vertical localization once you believe the horizontal impact makes sense. Without any vertical localization, the observation should have an impact along the entire vertical column. (For lower order models this doesn't apply.) If you change the cutoff distances you should be able to watch the change in impact on the state and make sure that it's something reasonable. Now you can use the observation space diagnostics, and the state space diagnostics to get more information about what the impact was, and whether it's doing the right thing or not. [top] ## Was the Assimilation Effective? If your filter run finished, and the Posterior is now different from the Prior, now what? This section presumes that you have debugged your model/DART interfaces or are using a model that came with the DART distribution. A working implementation. There is no single metric that can declare success. The DART Tutorial has the best explanation of what to look for, what to change to improve the next experiment, etc. DART has an extensive set of diagnostics implemented in Matlab® to use them, make sure you have read the Configuring Matlab® to work with DART section. ### The Observations are the Key. My own (Tim's) personal view is that the first thing to check is to see how many observations are getting rejected by the assimilation in addition to the RMSE and spread of the ensemble. A natural part of the DART framework is that these metrics are calculated automatically and are readily available in the obs_seq.final files. Checking the temporal evolution of the RMSE and observation rejection characteristics is a first-order metric for determining the health of the assimilation system. 1. Use obs_diag.f90 to process the collection of obs_seq.final files for regions and times of interest. Edit the input.nml:obs_diag_nml namelist to reflect the details of your experiment, and then run obs_diag to create a netCDF file obs_diag_output.nc that contains the summaries. 2. Make sure the spread of the ensemble does not collapse. Use plot_evolution.m with copystring = 'spread'; to explore obs_diag_output.nc. It is normal (and desirable!) for the spread to decrease somewhat from the initial value, but it should not decrease to a small value. Insufficient spread leads to filter divergence and a large observation rejection rate. plot_evolution.m automatically plots the number of observations available and the number of observations successfully assimilated. 3. Make sure the RMSE of the ensemble does not collapse. Use plot_evolution.m with copystring = 'rmse'; to explore obs_diag_output.nc. It is important to interpret the RMSE in light of the number of observations successfully assimilated. It is possible to have a very low RMSE if the assimilation system rejects all of the observations that are in disagreement! A low RMSE is desirable and is a much stronger result if most/all of the observations are being assimilated successfully. Also - the RMSE of the prior is a much stronger result. Any method can overfit the observations (match them perfectly) - what is important is that the forecast is a good forecast! 4. Make sure the RMSE of the ensemble does not continually increase. plot_evolution.m with copystring = 'rmse'; to explore obs_diag_output.nc. It is natural for the RMSE to vary in time in response to the changing number and location of the observations, the phenomenon being modeled, etc. ... but it should not generally increase as the system evolves. Once the system has 'burned in', the RMSE should be relatively stable. 5. Check to make sure the observations that are getting rejected are getting rejected for the right reasons. Run obs_seq_to_netcdf to convert the obs_seq.final files into netCDF files that can be explored with link_obs.m or plot_obs_netcdf.m. Both of these tools will allow you to isolate the rejected observations and determine if the observations are being rejected because they are in disagreement with the ensemble (DART QC == 7) or they were rejected because of a namelist setting (DART QC == 5), or incoming QC value (DART QC == 6), or were they simply outside the domain (DART generally only interpolates, not extrapolate) or ... 6. Check that the ensemble spread captures the 'truth' as determined by the observations. Use obs_diag to create the data for a rank histogram. Plot the histograms with ncview or plot_rank_histogram.m Generally speaking, the observation-space diagnostics provide the first and best metrics for the assimilation system. We always have observations, we rarely have the 'truth' in a full state-space representation. Personally, I rarely see value when comparing to some other gridded product - as it surely has its own set of deficiencies or assumptions. Observations aren't perfect - but they are still best. NOTE: Combining observations from multiple sources can lead to the same observation type begin defined with different vertical coordinate systems. While this does not pose a problem for assimilation, it does pose a problem for the diagnostics. The current obs_diag.f90 cannot (vertically) bin the same observation type that exploits two different vertical coordinate systems. If you get a WARNING from obs_diag:CheckVertical() you should know that the observation triggering the warning uses a different vertical coordinate system that the first observation encountered of that same type. If the termlevel is set to be 1, this is a fatal error, not a warning. The following example is one where observation 7113 defined the vertical coordinate system to be VERTISSURFACE (i.e. -1) and observation 7150 was created using VERTISPRESSURE. The messages take the form: WARNING FROM: routine: CheckVertical message: obs 7150 type 42 changing from -1 to pressure - def by obs 7113  ### State-Space Diagnostics If you're reading this, it means you have an assimilation that is not rejecting too many observations, the ensemble spread is relatively constant and is not too small, and presumably the RMSE is stable if not decreasing. Now it is time to assess the affect the assimilation has on your model - the whole point of data assimilation. Every DART assimilation produces two netCDF files: Prior_Diag.nc and Posterior_Diag.nc. These provide quick access to the individual state-space variables that constitute the DART state vector. If filter was run with input.nml:model_nml:output_state_vector = .false. (normally the default for larger models), the DART state vector will be reassembled into prognostic variables. These variables can be easily explored in standard netCDF fashion. It is very common and trivially easy to explore the increments:  ncdump -v CopyMetaData Prior_Diag.nc to check which copy you want to explore ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc ncdiff comes from NCO, not DART ncview Innov.nc ncview is another 'third-party' tool. See DART state-space diagnostics for more. [top] ## DART observation-space diagnostics. The DART QC table is an important piece of information. QC valuemeaning 0,1 == both Prior and Posterior are good. 0 Assimilated O.K. 1 Evaluated O.K., not assimilated because namelist specified evaluate only. 2,3 == Prior OK, but Posterior failed. 2 Assimilated O.K. BUT posterior forward operator failed. 3 Evaluated O.K. BUT posterior forward operator failed. 4 or higher == both Prior and Posterior have failed 4 Prior forward operator failed. 5 Not used because of namelist control. 6 Rejected because of incoming data QC higher than namelist control. 7 Rejected because of outlier threshold test. 8 and above reserved for future use. It is required to post-process the obs_seq.final file(s) with obs_diag to generate a netCDF file containing accumulated diagnostics for specified regions, etc. Since the experiment information (assimilation interval, assimilating model, etc.) are not recorded in the obs_seq.final file, the obs_diag_nml namelist has a section that allows specification of the necessary quantities. The following quantities are normally diagnosed:  Nposs the number of observations for each assimilation period; Nused the number of observations successfully assimilated each assimilation period; NbigQC the number of observations that had an undesirable (original) QC value; NbadIZ the number of observations that had an undesirable Innovation Z; NbadUV the number of velocity observations that had a matching component that was not assimilated; NbadLV the number of observations that were above or below the highest or lowest model level, respectively; rmse the rmse of the ensemble; bias the bias of the ensemble (forecast-observation); spread the spread of the ensemble; and the totalspread pooled spread of the observation (knowing its observational error) and the ensemble. NbadDARTQC the number of observations that had a DART QC value (> 1 for a prior, > 3 for a posterior) observation the mean of the observation values ens_mean the ensemble mean of the model estimates of the observation values N_DARTqc_0 the number of observations that had a DART QC value of 0 N_DARTqc_1 the number of observations that had a DART QC value of 1 N_DARTqc_2 the number of observations that had a DART QC value of 2 N_DARTqc_3 the number of observations that had a DART QC value of 3 N_DARTqc_4 the number of observations that had a DART QC value of 4 N_DARTqc_5 the number of observations that had a DART QC value of 5 N_DARTqc_6 the number of observations that had a DART QC value of 6 N_DARTqc_7 the number of observations that had a DART QC value of 7 The observation-space functions are in the DART/diagnostics/matlab directory. Once you have processed the obs_seq.final files into a single obs_diag_output.nc, you can use that as input to your own plotting routines or use the following DART Matlab® routines:  plot_evolution.m plots the temporal evolution of any of the quantities above for each variable for specified levels. The number of observations possible and used are plotted on the same axis. fname = 'POP11/obs_diag_output.nc'; % netcdf file produced by 'obs_diag' copystring = 'rmse'; % 'copy' string == quantity of interest plotdat = plot_evolution(fname,copystring); % -- OR -- plotdat = plot_evolution(fname,copystring,'RADIOSONDE_TEMPERATURE');  plot_profile.m plots the spatial and temporal average of any specified quantity as a function of height. The number of observations possible and used are plotted on the same axis. fname = 'POP11/obs_diag_output.nc'; % netcdf file produced by 'obs_diag' copystring = 'rmse'; % 'copy' string == quantity of interest plotdat = plot_profile(fname,copystring);  plot_rmse_xxx_evolution.m same as plot_evolution.m but will overlay rmse on the same axis. plot_rmse_xxx_profile.m same as plot_profile.m with an overlay of rmse. plot_bias_xxx_profile.m same as plot_profile.m with an overlay of bias. two_experiments_evolution.m same as plot_evolution.m but will overlay multiple (more than two, actually) experiments (i.e. multiple obs_diag_output.nc files) on the same axis. A separate figure is created for each region in the obs_diag_output.nc file. files = {'POP12/obs_diag_output.nc','POP11/obs_diag_output.nc'}; titles = {'CAM4','CAM3.6.71'}; varnames = {'ACARS_TEMPERATURE'}; qtty = 'rmse'; prpo = 'prior'; levelind = 5; two_experiments_evolution(files, titles,{'ACARS_TEMPERATURE'}, qtty, prpo, levelind)  two_experiments_profile.m same as plot_profile.m but will overlay multiple (more than two, actually) experiments (i.e. multiple obs_diag_output.nc files) on the same axis. If the obs_diag_output.nc file was created with multiple regions, there are multiple axes on a single figure. files = {'POP12/obs_diag_output.nc','POP11/obs_diag_output.nc'}; titles = {'CAM4','CAM3.6.71'}; varnames = {'ACARS_TEMPERATURE'}; qtty = 'rmse'; prpo = 'prior'; two_experiments_profile(files, titles, varnames, qtty, prpo)  plot_rank_histogram.m will create rank histograms for any variable that has that information present in obs_diag_output.nc. fname = 'obs_diag_output.nc'; % netcdf file produced by 'obs_diag' timeindex = 3; % plot the histogram for the third timestep plotdat = plot_rank_histogram(fname, timeindex, 'RADIOSONDE_TEMPERATURE');  plot_observation_locations.m This routine requires processing the observations through obs_diag with an input option to create a separate XYZ file of the observation locations. This has been deprecated in favor of plot_obs_netcdf, but still works just fine. You may also convert observation sequence files to netCDF by using obs_seq_to_netcdf. All of the following routines will work on observation sequences files AFTER an assimilation (i.e. obs_seq.final files that have been converted to netCDF), and some of them will work on obs_seq.out-type files that have been converted.  read_obs_netcdf.m reads a particular variable and copy from a netCDF-format observation sequence file and returns a single structure with useful bits for plotting/exploring. fname = 'obs_sequence_001.nc'; ObsTypeString = 'RADIOSONDE_U_WIND_COMPONENT'; % or 'ALL' ... region = [0 360 -90 90 -Inf Inf]; CopyString = 'NCEP BUFR observation'; QCString = 'DART quality control'; verbose = 1; % anything > 0 == 'true' obs = read_obs_netcdf(fname, ObsTypeString, region, CopyString, QCString, verbose);  plot_obs_netcdf.m creates a 3D scatterplot of the observation locations, color-coded to the observation values. A second axis will also plot the QC values if desired. fname = 'POP11/obs_epoch_011.nc'; region = [0 360 -90 90 -Inf Inf]; ObsTypeString = 'AIRCRAFT_U_WIND_COMPONENT'; CopyString = 'NCEP BUFR observation'; QCString = 'DART quality control'; maxgoodQC = 2; verbose = 1; % > 0 means 'print summary to command window' twoup = 1; % > 0 means 'use same Figure for QC plot' bob = plot_obs_netcdf(fname, ObsTypeString, region, CopyString, ... QCString, maxgoodQC, verbose, twoup);  plot_obs_netcdf_diffs.m creates a 3D scatterplot of the difference between two 'copies' of an observation. fname = 'POP11/obs_epoch_011.nc'; region = [0 360 -90 90 -Inf Inf]; ObsTypeString = 'AIRCRAFT_U_WIND_COMPONENT'; CopyString1 = 'NCEP BUFR observation'; CopyString2 = 'prior ensemble mean'; QCString = 'DART quality control'; maxQC = 2; verbose = 1; % > 0 means 'print summary to command window' twoup = 0; % > 0 means 'use same Figure for QC plot' bob = plot_obs_netcdf_diffs(fname, ObsTypeString, region, CopyString1, CopyString2, ... QCString, maxQC, verbose, twoup);  plot_wind_vectors.m creates a 2D 'quiver' plot of a wind field. fname = 'obs_epoch_001.nc'; platform = 'SAT'; % usually 'RADIOSONDE', 'SAT', 'METAR', ... CopyString = 'NCEP BUFR observation'; QCString = 'DART quality control'; region = [210 310 12 65 -Inf Inf]; scalefactor = 5; % reference arrow magnitude bob = plot_wind_vectors(fname, platform, CopyString, QCString, ... 'region', region, 'scalefactor', scalefactor);  link_obs.m creates multiple figures that have linked attributes. For example: select all the observations with DART QC == 4 in one window, and those same observations are highlighted in all the other windows. Make sure you explore the Matlab® rotate and brush icons in the Figure windows. All the images are links to larger versions - one image has the Matlab® call. New Features: The 3D scatterplot has symbols color-coded base on the observation value and there is a plot of the QC vs. observation value. ObsTimeCheck.m is an example of a trivial little script to wrap around plot_obs_netcdf.m that allows you to explore the spatial distribution of your observation sequences. Since obs_seq_to_netcdf doesn't know anything about assimilation windows; the idea is to create separate netCDF files for each assimilation window and then explore a sequence of windows. Since ObsTimeCheck.m is in the subversion repository, you should feel free to edit it/modify it to your heart's desire. If there are no observations of that type in a particular assimilation window, the Matlab® Command window will have a comment to that effect. [top] ## Understanding what's in obs_diag_output.nc After you create obs_diag_output.nc with obs_diag it is important to understand what is contained in obs_diag_output.nc. Remember, this is just a dump of the header of the file!  [work]$ ncdump -v CopyMetaData obs_diag_output.nc
netcdf obs_diag_output {
dimensions:
copy = 21 ;
obstypes = 78 ;
region = 1 ;
surface = 1 ;
undef = 1 ;
mlevel = 11 ;
mlevel_edges = 12 ;
plevel = 11 ;
plevel_edges = 12 ;
hlevel = 11 ;
hlevel_edges = 12 ;
time = UNLIMITED ; // (6 currently)
bounds = 2 ;
stringlength = 32 ;
rank_bins = 97 ;
variables:
int copy(copy) ;
int obstypes(obstypes) ;
obstypes:explanation = "see ObservationTypes" ;
int region(region) ;
region:long_name = "model region" ;
region:units = "nondimensional" ;
region:valid_range = 1, 1 ;
int mlevel(mlevel) ;
mlevel:long_name = "model level" ;
mlevel:units = "model level" ;
mlevel:axis = "Z" ;
mlevel:valid_range = 1, 11 ;
float mlevel_edges(mlevel_edges) ;
mlevel_edges:long_name = "model level edges" ;
mlevel_edges:units = "model level" ;
mlevel_edges:axis = "Z" ;
mlevel_edges:valid_range = 0.5, 11.5 ;
float plevel(plevel) ;
plevel:long_name = "pressure bin midpoints" ;
plevel:units = "hPa" ;
plevel:axis = "Z" ;
plevel:valid_range = 100., 1000. ;
float plevel_edges(plevel_edges) ;
plevel_edges:long_name = "pressure bin edges" ;
plevel_edges:units = "hPa" ;
plevel_edges:axis = "Z" ;
plevel_edges:valid_range = 75., 1025. ;
float hlevel(hlevel) ;
hlevel:long_name = "height bin midpoints" ;
hlevel:units = "m" ;
hlevel:axis = "Z" ;
hlevel:valid_range = 1000., 11000. ;
float hlevel_edges(hlevel_edges) ;
hlevel_edges:long_name = "height bin edges" ;
hlevel_edges:units = "m" ;
hlevel_edges:axis = "Z" ;
hlevel_edges:valid_range = 500., 11500. ;
int bounds(bounds) ;
bounds:valid_range = 1, 2 ;
double time(time) ;
time:standard_name = "time" ;
time:long_name = "temporal bin midpoints" ;
time:units = "days since 1601-1-1" ;
time:calendar = "Gregorian" ;
time:axis = "T" ;
time:bounds = "time_bounds" ;
time:valid_range = 148880.5, 148885.5 ;
double time_bounds(time, bounds) ;
time_bounds:long_name = "temporal bin edges" ;
time_bounds:units = "days since 1601-1-1" ;
time_bounds:calendar = "Gregorian" ;
time_bounds:valid_range = 148880.000011574, 148886. ;
char region_names(region, stringlength) ;
region_names:long_name = "region names" ;
char ObservationTypes(obstypes, stringlength) ;
ObservationTypes:long_name = "DART observation types" ;
ObservationTypes:comment = "table relating integer to observation type string" ;
int rank_bins(rank_bins) ;
rank_bins:long_name = "rank histogram bins" ;
rank_bins:comment = "position of the observation among the sorted noisy ensemble members" ;
float RADIOSONDE_U_WIND_COMPONENT_guess(time, copy, plevel, region) ;
int   RADIOSONDE_U_WIND_COMPONENT_guess_RankHi(time, rank_bins, plevel, region) ;
...
float AIRCRAFT_U_WIND_COMPONENT_guess(time, copy, plevel, region) ;
AIRCRAFT_U_WIND_COMPONENT_guess:_FillValue = -888888.f ;
AIRCRAFT_U_WIND_COMPONENT_guess:missing_value = -888888.f ;
int   AIRCRAFT_U_WIND_COMPONENT_guess_RankHist(time, rank_bins, plevel, region) ;
...
float RADIOSONDE_U_WIND_COMPONENT_analy(time, copy, plevel, region) ;
...
float AIRCRAFT_U_WIND_COMPONENT_analy(time, copy, plevel, region) ;
AIRCRAFT_U_WIND_COMPONENT_analy:_FillValue = -888888.f ;
AIRCRAFT_U_WIND_COMPONENT_analy:missing_value = -888888.f ;
...
...
...
...

// global attributes:
:creation_date = "YYYY MM DD HH MM SS = 2011 03 18 13 37 34" ;
:obs_diag_source = "URL: diagnostics/threed_sphere/obs_diag.html $" ; :obs_diag_revision = "$Revision: 8791 $" ; :obs_diag_revdate = "$Date: 2015-10-09 11:12:09 -0600 (Fri, 09 Oct 2015) $" ; :bias_convention = "model - observation" ; :horizontal_wind = "vector wind derived from U,V components" ; :horizontal_wind_bias = "definition : sum[sqrt(u**2 + v**2) - obsspeed]/nobs" ; :horizontal_wind_rmse = "definition : sqrt(sum[(u-uobs)**2 + (v-vobs)**2]/nobs)" ; :horizontal_wind_spread = "definition : sqrt(sum[var(u) + var(v)]/nobs)" ; :DART_QCs_in_histogram = 0, 1, 2, 3 ; :outliers_in_histogram = "FALSE" ; :first_bin_center = 2008, 8, 15, 12, 0, 0 ; :last_bin_center = 2008, 8, 20, 12, 0, 0 ; :bin_separation = 0, 0, 1, 0, 0, 0 ; :bin_width = 0, 0, 1, 0, 0, 0 ; :time_to_skip = 0, 0, 0, 0, 0, 0 ; :max_num_bins = 1000 ; :rat_cri = 5000. ; :input_qc_threshold = 3. ; :lonlim1 = 0. ; :lonlim2 = 360. ; :latlim1 = -90. ; :latlim2 = 90. ; :obs_seq_file_001 = "/ptmp/thoar/HuiGPS/08_01/obs_seq.final" ; ... ... data: CopyMetaData = "Nposs ", "Nused ", "NbigQC ", "NbadIZ ", "NbadUV ", "NbadLV ", "rmse ", "bias ", "spread ", "totalspread ", "NbadDARTQC ", "observation ", "ens_mean ", "N_DARTqc_0 ", "N_DARTqc_1 ", "N_DARTqc_2 ", "N_DARTqc_3 ", "N_DARTqc_4 ", "N_DARTqc_5 ", "N_DARTqc_6 ", "N_DARTqc_7 " ; }  There are many more variables in this particular netCDF file - indicated by the '...' (the ellipsis). Every observation type is preserved in its own set of variables with the following suffixes: suffixdescription _guessprior, forecast _analyposterior, analysis _VPguessvertical profile only - prior, forecast _VPanalyvertical profile only - posterior, analysis _RankHistrank histogram - prior What is really important to note is that each observation variable has a copy dimension - and each copy description is contained in the CopyMetaData variable. A dump of that variable provides information about what quantities are directly available. In the above example, the rmse is copy 7. You should never assume this information, you should always check the CopyMetaData variable to find the appropriate copy index. Let's look at the RADIOSONDE_U_WIND_COMPONENT_guess variable in the example. It has 4 dimensions: [time, copy, plevel, region]. Since 'plevel' is one of the dimensions, the appropriate levels are defined by the coordinate variable of the same name. The RADIOSONDE_U_WIND_COMPONENT_VPguess variable has 3 dimensions: [copy, plevel, region]. The 'time' dimension has been averaged out such that a single Vertical Profile (VP) is defined for the entire timeframe specified in the namelist. If I were a better man, I'd put the averaging interval in the variable metadata to have a hope of complying with convention; instead, it is only in the global metadata for the entire netCDF file as first_bin_center, last_bin_center, and time_to_skip. Add the time_to_skip to the first_bin_center to derive the start of the averaging period. This allows one to ignore the effects of spinup. The RADIOSONDE_U_WIND_COMPONENT_guess_RankHi variable name has been cut short by the netCDF restriction that variable names can only contain 40 characters. The _RankHist variables are only present if the input obs_seq.final files have multiple ensemble members present. You cannot derive this information unless the assimilation was performed with filter_nml:num_output_obs_members equal to something like the ensemble size. [top] ## Viewing the rank histogram information with 'ncview' After you create obs_diag_output.nc with obs_diag you can view the rank histograms in the following way: [work]$ ncview obs_diag_output.nc


Note there are forty-seven 3D variables. Pick one. In this case, I selected AIRCRAFT_U_WIND_COMPONENT_guess_RankHist

Navigate to the time of interest (these are the arrows next to the QUIT button.) If ncview was built with udunits support, the actual calendar dates and times should be shown next to the frame counter. That should generate something like the graphic on the right below. Since the default axes are (Y == histogram_bins, X == levels) and there are many more ensemble members (96) than vertical levels (20) in this netCDF file, the graphic appears tall and narrow.

Click anywhere on the graphic and something like the following is displayed:

Change the "X Axis:" to "rank_bins" and a new graphic will display the rank histogram.

If you continue to click on the "tall,skinny" graphic, the histogram for that level will be added to the rank histogram window. Remember, levels are along the X axis on the "tall,skinny" graphic. Viola'!

[top]

## DART state-space diagnostics.

### Matlab®

There are a set of Matlab® functions to help explore the assimilation performance in state-space, which is very useful for OSSE's (i.e. when you know the true model state). The general guideline here is that anything that computes an 'error' requires the truth. There are some functions that work without a true state.

In order to use any of these functions, the scripts need to know how to interpret the layout of the netCDF file - which is usually model-dependent. See the section on Adding Matlab® support for your own model if you are not using one of the supported DART models.

The state-space functions are in the DART/matlab directory:

 plot_total_err.m plots the evolution of the error (un-normalized) and ensemble spread of all state variables. plot_bins.m plots the rank histograms for a set of state variables. plot_ens_time_series.m plots the evolution of a set of state variables - all ensemble members, the ensemble mean (and Truth, if available). plot_ens_mean_time_series.m plots the evolution of a set of state variables - just the ensemble mean (and Truth, if available). plot_ens_err_spread.m plots the evolution of the ensemble error and spread for a select set of state variables. plot_correl.m plots the correlation through time of a state variable and the rest of the state. plot_var_var_correl.m plots the correlation between two variables at a particular location. plot_jeff_correl.m plots the correlation through time of a state variable at a particular time and any state variable. plot_sawtooth.m plots the trajectory of any set of state variables highlighting the assimilation 'increment'. plot_phase_space.m plots the trajectory of any two or three state variables. The classic attractor diagram. plot_smoother_err.m plots the error of the ensemble smoother - which uses observations in the future as well as the present.

DART diagnostic variables all have a 'copy' dimension that relates to the fact that it might be the ensemble mean, the ensemble spread, ensemble member 43, the inflation values ... a whole host of possibilities. For instance:

0[1020] swordfish:~/<3>models/wrf/work % ncdump -v copy,CopyMetaData Prior_Diag.nc
netcdf Prior_Diag {
dimensions:
copy = 3 ;
time = UNLIMITED ; // (1 currently)
NMLlinelen = 129 ;
NMLnlines = 203 ;
domain = 1 ;
west_east_d01 = 267 ;
west_east_stag_d01 = 268 ;
south_north_d01 = 215 ;
south_north_stag_d01 = 216 ;
bottom_top_d01 = 35 ;
bottom_top_stag_d01 = 36 ;
variables:
int copy(copy) ;
copy:long_name = "ensemble member or copy" ;
copy:units = "nondimensional" ;
copy:valid_range = 1, 100 ;
...
float U_d01(time, copy, bottom_top_d01, south_north_d01, west_east_stag_d01) ;
...
copy = 1, 2, 3 ;

"ensemble mean         ",
"ensemble member      1" ;
}


### Non-Matlab® state-space diagnostics

The innovations to the model state are easy to derive. Use the NCO Operator ncdiff to difference the two DART diagnostic netCDF files to create the innovations. Be sure to check the CopyMetaData variable to figure out what copy is of interest. Then, use ncview to explore the innovations or the inflation values or ...

If the assimilation used state-space inflation, the inflation fields will be added as additional 'copies'. A sure sign of trouble is if the inflation fields grow without bound. As the observation network changes, expect the inflation values to change.

The only other thing I look for in state-space is that the increments are 'reasonable'. As the assimilation 'burns in', the increments are generally larger than increments from an assimilation that has been cycling for a long time. If the increments keep getting bigger, the ensemble is continually drifting away from the observation. Not good. In ncview, it is useful to navigate to the copy/level of interest and re-range the data to values appropriate to the current data and then hit the '>>' button to animate the image. It should be possible to get a sense of the magnitude of the innovations as a function of time.

Example
I ran a perfect model experiment with the bgrid model in the DART-default configuration and turned on some adaptive inflation for this example. To fully demonstrate the adaptive inflation, it is useful to have an observation network that changes through time. I created two observation sequence files: one that had a single 'RADIOSONDE_TEMPERATURE' observation at the surface with an observation error variance of 1.5 degrees Kelvin - repeated every 6 hours for 6 days (24 timesteps); and one that had 9 observations locations clustered in about the same location that repeated every 6 hours for 1.5 days (6 timesteps). I merged the two observation sequences into one using obs_sequence_tool and ran them through perfect_model_obs to derive the observation values and create an obs_seq.out file to run through filter.

0[1074] models/bgrid_solo/work %  ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc
0[1075] models/bgrid_solo/work %  ncview Prior_Diag.nc &
0[1076] models/bgrid_solo/work %  ncview Innov.nc &
0[1077] models/bgrid_solo/work %  ncdump -v CopyMetaData Prior_Diag.nc
netcdf Prior_Diag {
dimensions:
locationrank = 3 ;
copy = 24 ;
time = UNLIMITED ; // (24 currently)
NMLlinelen = 129 ;
NMLnlines = 303 ;
StateVariable = 28200 ;
TmpI = 60 ;
TmpJ = 30 ;
lev = 5 ;
VelI = 60 ;
VelJ = 29 ;
variables:
...
float ps(time, copy, TmpJ, TmpI) ;
ps:long_name = "surface pressure" ;
ps:units = "Pa" ;
ps:units_long_name = "pascals" ;
float t(time, copy, lev, TmpJ, TmpI) ;
t:long_name = "temperature" ;
t:units = "degrees Kelvin" ;
float u(time, copy, lev, VelJ, VelI) ;
u:long_name = "zonal wind component" ;
u:units = "m/s" ;
float v(time, copy, lev, VelJ, VelI) ;
v:long_name = "meridional wind component" ;
v:units = "m/s" ;
...
data:

"ensemble mean          ",
"ensemble member      1 ",
"ensemble member      2 ",
"ensemble member      3 ",
"ensemble member      4 ",
"ensemble member      5 ",
"ensemble member      6 ",
"ensemble member      7 ",
"ensemble member      8 ",
"ensemble member      9 ",
"ensemble member     10 ",
"ensemble member     11 ",
"ensemble member     12 ",
"ensemble member     13 ",
"ensemble member     14 ",
"ensemble member     15 ",
"ensemble member     16 ",
"ensemble member     17 ",
"ensemble member     18 ",
"ensemble member     19 ",
"ensemble member     20 ",
"inflation mean         ",
"inflation sd           " ;
}


 This is an exploration of the Prior_diag.nc file. Note that I selected the 't' field, turned the coastlines 'off' under the 'Opts' button, used the 'Repl' instead of 'Bi-lin' (to more faithfully represent the model resolution), navigated to copy 23 of 24 (in this case, the inflation mean) and advanced to the last timestep. The image plot is pretty boring, but does indicate that the inflation values are restricted to where I put the observations. Right-clicking on the 'Range' button automatically re-ranges the colorbar to the min/max of the current data. Clicking on any location generates a time series figure.

 This is an exploration of the Innov.nc file as created by ncdiff. Note that the titles are somewhat misleading because they reflect information from the first file given to ncdiff. This time I left the rendering as 'Bi-lin' (which obfuscates the model resolution), navigated to copy 1 of 24 (in this case, the ensemble mean) and advanced to the 6th timestep. Right-click on the 'Range' button to reset the colorbar. The image plot confirms that the innovations are restricted to a local region. Clicking on any location generates a time series.

 This is fundamentally the same as the previous panel except that I have now selected the 'u' variable. Despite the fact the observations were only of 't', the assimilation has generated (rightly so) increments to the 'u' state variable.

[top]

# Configuring Matlab® to work with DART

Matlab® R2008b is the first version to have native netCDF support, with its own syntax that would require a total rewrite of the DART interfaces that would then be incompatible with older versions of Matlab®.

As of July 2009 DART exclusively uses the snctools interface functions for netCDF - which rely solely on the mexnc mex-file interface and is available for 'all' versions of Matlab®.

Find your version of Matlab® (type 'ver' at the Matlab prompt) and visit http://mexcdf.sourceforge.net/downloads to get the right combination of mexnc and snctools. The netcdf_toolbox is no longer needed by DART.

There was a fundamental change in snctools with revision 4028 that essentially breaks some key components of the DART diagnostic routines. Consequently, we recommend that you use version 4024 of the snctools and mexnc toolboxes. Here is how we download and install that particular version:

cd [wherever_you_install_toolboxes]
svn co -r 4024 svn://svn.code.sf.net/p/mexcdf/svn/mexnc/trunk  mexnc/4024
svn co -r 4024 svn://svn.code.sf.net/p/mexcdf/svn/snctools/trunk  snctools/4024


You will need the 'base' DART/matlab functions available to Matlab®, so be sure your MATLABPATH is set such that you have access to get_copy_index as well as nc_varget (which comes from snctools). This generally means you will have to manipulate your MATLABPATH with something like (only add the first path if it exists for your model):

addpath('replace_this_with_the_real_path_to/DART/models/your_model/matlab')
`

On my systems, I've bundled these commands into a function called ~/matlab/startup.m which is automatically run every time I start Matlab®.

[top]

## Examples - under construction

1. observation location/value plots
2. a brief explanation of 'localization'
3. namelist settings for damped adaptive spatially-varying group filter

[top]

## Customizing DART - under construction.

Modify the code to your heart's content.
'svn revert' cures all ...

[top]

Please let us know if you have suggestions or code to contribute to DART. We're a small group, but we are willing to listen and will make every endeavor to incorporate improvements to the code. Email us at dart@ucar.edu.

[top]

## Assimilation Algorithms employed in DART.

The DART system includes several different types of Kalman filter algorithms, selectable at runtime by a namelist setting. See the documentation for the assimilation tools module for descriptions of the various filter types and how to select them.

[top]

## Namelists

Many DART programs have namelists to specify run-time control. Some programs use one or more modules - each module may have its own namelist. As a consequence, we find it convenient to have one file (called input.nml) specifying all the namelists.

For a complete list of namelists, see the following document: master list of namelists

An example namelist for each program is automatically built when the makefile is generated by mkmf_xxxxx. The example namelist is named input.nml.xxxxx_default where xxxxx is the name of the program. The example namelists have default values, which may not be appropriate for your use. The default input.nml in each work directory generally has better values. As usual, the documentation for each module is the best place to get information about the namelist settings.

[top]