DAReS header

    The 'low-order' models supported in DART.


    ikeda

    The Ikeda model is a 2D chaotic map useful for visualization data assimilation updating directly in state space. There are three parameters: a, b, and mu. The state is 2D, x = [X Y]. The equations are:

    X(i+1) = 1 + mu * ( X(i) * cos( t ) - Y(i) * sin( t ) )
    Y(i+1) =     mu * ( X(i) * sin( t ) + Y(i) * cos( t ) ),
    

    where

    t = a - b / ( X(i)**2 + Y(i)**2 + 1 )
    

    Note the system is time-discrete already, meaning there is no delta_t. The system stems from nonlinear optics (Ikeda 1979, Optics Communications). Interface written by Greg Lawson. Thanks Greg!


    lorenz_63

    This is the 3-variable model as described in: Lorenz, E. N. 1963. Deterministic nonperiodic flow. J. Atmos. Sci. 20, 130-141.
    The system of equations is:

    X' = -sigma*X + sigma*Y
    Y' = -XZ + rX - Y
    Z' =  XY -bZ
    

    lorenz_84

    This model is based on:   Lorenz E. N., 1984: Irregularity: A fundamental property of the atmosphere. Tellus36A, 98-110.
    The system of equations is:

    X' = -Y^2 - Z^2  - aX  + aF
    Y' =  XY  - bXZ  - Y   + G
    Z' = bXY  +  XZ  - Z
    

    Where a, b, F, and G are the model parameters.


    9var

    This model provides interesting off-attractor transients that behave something like gravity waves.


    lorenz_96

    This is the model we use to become familiar with new architectures, i.e., it is the one we use 'first'. It can be called as a subroutine or as a separate executable. We can test this model both single-threaded and mpi-enabled.

    Quoting from the Lorenz 1998 paper:

    ... the authors introduce a model consisting of 40 ordinary differential equations, with the dependent variables representing values of some atmospheric quantity at 40 sites spaced equally about a latitude circle. The equations contain quadratic, linear, and constant terms representing advection, dissipation, and external forcing. Numerical integration indicates that small errors (differences between solutions) tend to double in about 2 days. Localized errors tend to spread eastward as they grow, encircling the globe after about 14 days.
    ...
    We have chosen a model with J variables, denoted by X1, ..., XJ; in most of our experiments we have let J = 40. The governing equations are:
    dXj/dt = (Xj+1 - Xj-2)Xj-1 - Xj + F         (1)
    
    for j = 1, ..., J. To make Eq. (1) meaningful for all values of j we define X-1 = XJ-1, X0 = XJ, and XJ+1 = X1, so that the variables form a cyclic chain, and may be looked at as values of some unspecified scalar meteorological quantity, perhaps vorticity or temperature, at J equally spaced sites extending around a latitude circle. Nothing will simulate the atmosphere's latitudinal or vertical extent.


    forced_lorenz_96

    The forced_lorenz_96 model implements the standard L96 equations except that the forcing term, F, is added to the state vector and is assigned an independent value at each gridpoint. The result is a model that is twice as big as the standard L96 model. The forcing can be allowed to vary in time or can be held fixed so that the model looks like the standard L96 but with a state vector that includes the constant forcing term. An option is also included to add random noise to the forcing terms as part of the time tendency computation which can help in assimilation performance. If the random noise option is turned off (see namelist) the time tendency of the forcing terms is 0.


    lorenz_96_2scale

    This is the Lorenz 96 2-scale model, documented in Lorenz (1995). It also has the option of the variant on the model from Smith (2001), which is invoked by setting local_y = .true. in the namelist. The time step, coupling, forcing, number of X variables, and the number of Ys per X are all specified in the namelist. Defaults are chosen depending on whether the Lorenz or Smith option is specified in the namelist. Lorenz is the default model. Interface written by Josh Hacker. Thanks Josh!


    lorenz_04

    The reference for these models is Lorenz, E.N., 2005: Designing chaotic models. J. Atmos. Sci.62, 1574-1587.
    Model II is a single-scale model, similar to Lorenz 96, but with spatial continuity in the waves. Model III is a two-scale model. It is fudamentally different from the Lorenz 96 two-scale model because of the spatial continuity and the fact that both scales are projected onto a single variable of integration. The scale separation is achived by a spatial filter and is therefore not perfect (i.e. there is leakage). The slow scale in model III is model II, and thus model II is a deficient form of model III. The basic equations are documented in Lorenz (2005) and also in the model_mod.f90 code. The user is free to choose model II or III with a Namelist variable.


    simple_advection

    This model is on a periodic one-dimensional domain. A wind field is modeled using Burger's Equation with an upstream semi-lagrangian differencing. This diffusive numerical scheme is stable and forcing is provided by adding in random gaussian noise to each wind grid variable independently at each timestep. An Eulerian option with centered-in-space differencing is also provided. The Eulerian differencing is both numerically unstable and subject to shock formation. However, it can sometimes be made stable in assimilation mode (see recent work by Majda and collaborators).


    [top]


    The 'high-order' models supported in DART.

    In roughly the order they were supported by DART.


    bgrid_solo

    This is a dynamical core for B-grid dynamics using the Held-Suarez forcing. The resolution is configurable, and the entire model can be run as a subroutine. Status: supported.


    pe2lyr

    This model is a 2-layer, isentropic, primitive equation model on a sphere. Status: orphaned.


    wrf

    The Weather Research and Forecasting (WRF) Model is a next-generation mesoscale numerical weather prediction system designed to serve both operational forecasting and atmospheric research needs. More people are using DART with WRF than any other model. Note: The actual WRF code is not distributed with DART. Status: supported.


    cam

    The Community Atmosphere Model (CAM) is the latest in a series of global atmosphere models developed at NCAR for the weather and climate research communities. CAM also serves as the atmospheric component of the Community Climate System Model (CCSM). Status: supported.


    PBL_1d

    The PBL model is a single column version of the WRF model. In this instance, the necessary portions of the WRF code are distributed with DART. Status: supported - but looking to be adopted.


    MITgcm_annulus

    The MITgcm annulus model as configured for this application within DART is a non-hydrostatic, rigid lid, C-grid, primitive equation model utilizing a cylindrical coordinate system. For detailed information about the MITgcm, see http://mitgcm.org Status: orphaned - and looking to be adopted.


    rose

    The rose model is for the stratosphere-mesosphere and was used by Tomoko Matsuo (now at CU-Boulder and NOAA) for research in the assimilation of observations of the Mesosphere Lower-Thermosphere (MLT). Note: the model code is not distributed with DART. Status: orphaned


    MITgcm_ocean

    The MIT ocean GCM version 'checkpoint59a' is the foundation of this implementation. It was modified by Ibrahim Hoteit (then of Scripps) to accomodate the interfaces needed by DART. Status: supported - but looking to be adopted.


    am2

    The FMS AM2 model is GFDL's atmosphere-only code using observed sea surface temperatures, time-varying radiative forcings (including volcanos) and time-varying land cover type. This version of AM2 (also called AM2.1) uses the finite-volume dynamical core (Lin 2004). Robert Pincus (CIRES/NOAA ESRL PSD1) and Patrick Hoffman (NOAA) wrote the DART interface and are currently using the system for research. Note: the model code is not distributed with DART. Status: supported


    coamps

    The DART interface was originally written and supported by Tim Whitcomb. The following model description is taken from the COAMPS overview web page:

    The Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS) has been developed by the Marine Meteorology Division (MMD) of the Naval Research Laboratory (NRL). The atmospheric components of COAMPS, described below, are used operationally by the U.S. Navy for short-term numerical weather prediction for various regions around the world.

    Note: the model code is not distributed with DART. Status: supported


    POP

    The Parallel Ocean Program (POP) comes in two variants. Los Alamos National Laboratory provides POP Version 2.0 which has been modified to run in the NCAR Community Climate System Model (CCSM) framework. As of November 2009, the CCSM-POP version is being run. The LANL-POP version is nearly supported - and some extensions useful for data assimilation in general have been proposed to LANL, who have agreed in principle to implement the changes. Fundamentally, the change is an additional restart option in which the first timestep after an assimilation is a Eulerian timestep (similar to a cold start). Note: the souce code for POP is not distributed with DART. Status: actively being developed


    [top]


    Downloadable datasets for DART.


    The code distribution was getting 'cluttered' with datasets, boundary conditions, intial conditions, ... large files that were not necessarily interesting to all people who downloaded the DART code. Worse, subversion makes a local hidden copy of the original repository contents, so the penalty for being large is doubled. It just made sense to make all the large files available on as 'as-needed' basis.

    To keep the size of the DART distribution down we have a separate www-site to provide some observation sequences, initial conditions, and general datasets. It is our intent to populate this site with some 'verification' results, i.e. assimilations that were known to be 'good' and that should be fairly reproducible - appropriate to test the DART installation.

    Please be patient as I make time to populate this directory. (yes, 'make', all my 'found' time is taken ...)
    Observation sequences can be found at http://www.image.ucar.edu/pub/DART/Obs_sets.

    Verification experiments will be posted to http://www.image.ucar.edu/pub/DART/VerificationData as soon as I can get to it. These experiments will consist of initial conditions files for testing different high-order models like CAM, WRF, POP ...
    The low-order models are already distributed with verification data in their work directories.

    Useful bits for CAM can be found at http://www.image.ucar.edu/pub/DART/CAM.
    Useful bits for WRF can be found at http://www.image.ucar.edu/pub/DART/WRF.


    [top]


    Creating initial conditions for DART


    The idea is to generate an ensemble that has sufficient 'spread' to cover the range of possible solutions. Insufficient spread can (and usually will) lead to poor assimilations. Think 'filter divergence'.

    Generating an ensemble of initial conditions can be done in lots of ways, only a couple of which will be discussed here. The first is to generate a single initial condition and let DART perturb it with noise of a nature you specify to generate as many ensemble members as you like. The second is to take some existing collection of model states and convert them to DART initial conditions files and then use the restart_file_tool to set the proper date in the files. The hard part is then coming up with the original collection of model state(s).


    Adding noise to a single model state

    This method works well for some models, and fails miserably for others. As it stands, DART supplies a routine that can add gaussian noise to every element of a state vector. This can cause some models to be numerically unstable. You can supply your own model_mod:pert_model_state() if you want a more sophisticated perturbation scheme.


    Using a collection of model states.

    The important thing to remember is that the high-order models all come with routines to convert a single model restart file (or the equivalent) to a DART initial conditions file. CAM has trans_pv_sv, WRF has wrf_to_dart, POP has pop_to_dart, etc. DART has the ability to read a single file that contains initial conditions for all the ensemble members, or a series of restart files - one for each ensemble member. Simply collect your ensemble of restart files from your model and convert each of them to a DART initial conditions file of the form filter_ics.#### where #### represents a 4 digit ensemble member counter. That is, for a 50-member ensemble, they should be named: filter_ics.0001  ... filter_ics.0050

    Frequently, the initial ensemble of restart files is some climatological collection. For CAM experiments, we usually start with N different 'January 1' states ... from N different years. The DART utility program restart_file_tool is then run on each of these initial conditions files to set a consistent date for all of the initial conditions. Experience has shown that it takes less than a week of assimilating 4x/day to achieve a steady ensemble spread. WRF has its own method of generating an initial ensemble. For that, it is best to go to contact someone familiar with WRF/DART.


    Initial conditions for the low-order models.

    In general, there are 'restart files' for the low-order models that already exist as work/filter_ics. If you need more ensemble members than are supplied by these files, you can generate your own by adding noise to a single perfect_ics file. Simply specify

    &filter_nml
    start_from_restart   = .FALSE.,
    restart_in_file_name = "perfect_ics",
    ens_size             = [whatever you want]
    


    [top]


    'perfect model' experiments or 'OSSE's.


    All of the workshop and tutorial examples are 'perfect model' experiments. The ability to compare against 'the truth' is great for exploring what does and doesn't work during experimentation.

    Every low-order model has a workshop_setup.csh that compiles all the executables needed to run an OSSE, and then actually runs them. The (empty) observation sequence files have been specified for what, where, and when 'observations' will be needed. This was done with create_obs_sequence and create_fixed_network_seq. Run them yourself if you want to understand exactly what it takes to create an observation sequence file devoid of the observation values. The examples are very run-time-output verbose - great for understanding what is going on, but just awful for performance. The run-time verbosity can be cut down when running larger models.

    Some of the models have input values that are designed to produce poor (horrible, actually) assimilations, and some perform quite nicely. The DART Tutorial provides instructions on how to modify the filter input and diagnose the results.


    Use DART to run a 'perfect model' experiment.


    Once a model is compatible with the DART facility, all of the functionality of DART is available. This includes 'perfect model' experiments (also called Observing System Simulation Experiments - OSSEs). Essentially, the model is run forward from some state and, at predefined times, the observation forward operator is applied to the model state to harvest synthetic observations. This model trajectory is known as the 'true state'. The synthetic observations are then used in an assimilation experiment. The assimilation performance can then be evaluated precisely because the true state (of the model) is known.

    The basic steps to running an OSSE from within DART are:

    1. Run create_obs_sequence to generate the type of observation (and observation error) desired.
    2. Run create_fixed_network_seq to define the temporal distribution of the desired observations.
    3. Run perfect_model_obs to advance the model from a known initial condition - and harvest the 'observations' (with error) from the (known) true state of the model.
    4. Run filter to assimilate the 'observations'. Since the true model state is known, it is possible to evaluate the performance of the assimilation.

    An OSSE is explored in our Lorenz '96 example.

    More information about creating observation sequence files for OSSE's is available in the observation sequence discussion section.

    There are a set of Matlab® functions to help explore the assimilation performance in state-space. The state-space functions are in the DART/matlab directory. Once you fire up Matlab® and have the netCDF support sorted out, you will essentially follow the same procedure as that outlined in the "Are the results correct?" section. The most common functions are listed below. They each have a help document available by issuing the help plot_bins command at the Matlab® prompt (for example).


    plot_bins.m plots the rank histograms for a set of state variables.
    plot_total_err.m plots the evolution of the error (un-normalized) and ensemble spread of all state variables.
    plot_ens_mean_time_series.m    plots the evolution of a set of state variables - just the ensemble mean (and Truth, if available).

    [top]


    Configuring Matlab® to work for DART


    Matlab® R2008b is the first version to have native netCDF support, with its own syntax that would require a total rewrite of the DART interfaces that would then be incompatible with older versions of Matlab®.

    As of July 2009 DART uses the snctools interface functions for netCDF - which rely solely on the mexnc mex-file interface and is available for 'all' versions of Matlab®. The migration away from the inconsistent DART use of the netcdf_toolbox and the CSIRO toolbox matlab_netCDF_OPeNDAP (i.e. the 'getnc' function) is virtually complete and greatly eases the installation of the Matlab® netCDF support needed by DART.

    Find your version of Matlab® (type 'ver' at the Matlab prompt) and visit http://mexcdf.sourceforge.net/downloads to get the right combination of mexnc and snctools. The netcdf_toolbox subset of functions has been deprecated by their developers, who are now supporting the snctools set of functions. The netcdf_toolbox is still getting distributed with snctools, you can install them if you like, but they are not needed by DART.

    You will need the 'normal' DART/matlab functions available to Matlab, so be sure your MATLABPATH is set such that you have access to get_copy_index as well as nc_varget (which comes from snctools). This generally means you will have to manipulate your MATLABPATH with something like:


    addpath('replace_this_with_the_real_path_to/DART/matlab')
    addpath('replace_this_with_the_real_path_to/DART/diagnostics/matlab')
    addpath('some_netcdf_install_dir/snctools')
    addpath('some_netcdf_install_dir/mexnc','-BEGIN')
    addpath('some_netcdf_install_dir/netcdf_toolbox/netcdf')
    addpath('some_netcdf_install_dir/netcdf_toolbox/netcdf/nctype')
    addpath('some_netcdf_install_dir/netcdf_toolbox/netcdf/ncutility')
    addpath('some_CSIRO_install_dir/matlab_netCDF_OPeNDAP')
    

    which is precisely why I'm trying to shorten it. On my systems, I've bundled the first 4 commands into a function called ~/matlab/startup.m which is automatically run every time I start Matlab.



    [top]


    An overview of the DART 'preprocess' program


    First and foremost, check out the DART/preprocess/preprocess.html document for detailed information.

    The preprocess program actually builds source code to be used by all the remaining modules. It is imperative to actually run preprocess before building any executables. This is how the same code can assimilate state vector 'observations' for the Lorenz_63 model and real radar reflectivities for WRF without needing to specify a set of radar operators for the Lorenz_63 model!

    preprocess combines multiple 'obs_def' modules into one obs_def_mod.f90 that is then used by the rest of DART. Additionally, a new obs_kind_mod.f90 is built that will provide support for associating the specific observation TYPES with corresponding (generic) observation KINDS. More on that later. The list of source codes is contained in the &preprocess_nml namelist and they ultimately determine what observations and operators are supported. If you want to add another 'obs_def' module, you must rerun preprocess and recompile the rest of your project. preprocess is designed to abort if the files it is supposed to build already exist. For this reason, it is necessary to remove a couple files (if they exist) before you run the preprocessor. It is just a good habit to develop.


    \rm -f ../../../obs_def/obs_def_mod.f90
    \rm -f ../../../obs_kind/obs_kind_mod.f90
    ./preprocess
    ls -l ../../../obs_def/obs_def_mod.f90
    ls -l ../../../obs_kind/obs_kind_mod.f90

    For example, with a namelist that looks like:

    &preprocess_nml
       input_obs_kind_mod_file = '../../../obs_kind/DEFAULT_obs_kind_mod.F90',
      output_obs_kind_mod_file = '../../../obs_kind/obs_kind_mod.f90',
        input_obs_def_mod_file = '../../../obs_def/DEFAULT_obs_def_mod.F90',
       output_obs_def_mod_file = '../../../obs_def/obs_def_mod.f90',
      input_files              = '../../../obs_def/obs_def_gps_mod.f90',
                                 '../../../obs_def/obs_def_QuikSCAT_mod.f90',
                                 '../../../obs_def/obs_def_GWD_mod.f90',
                                 '../../../obs_def/obs_def_altimeter_mod.f90',
                                 '../../../obs_def/obs_def_reanalysis_bufr_mod.f90'
       /
    

    preprocess will combine DEFAULT_obs_def_mod.F90, obs_def_gps_mod.f90, obs_def_QuikSCAT_mod.f90, obs_def_GWD_mod.f90, obs_def_altimeter_mod.f90, and obs_def_reanalysis_bufr_mod.f90, into obs_def_mod.f90 - which can be used by the rest of the project.


    Building and Running 'preprocess'


    preprocess is an executable, so it should come as no surprise that it must be built in the normal DART fashion. The DART/mkmf/mkmf.template must be correct for your environment, and the input.nml must have your desired preprocess_nml set correctly. Given that ...

    csh mkmf_preprocess
    make
    ./preprocess

    will build and run preprocess.

    The first command generates an appropriate Makefile and the input.nml.preprocess_default file. The second command results in the compilation of a series of Fortran90 modules which ultimately produces an executable file: preprocess. The third command actually runs preprocess - which builds the new obs_kind_mod.f90 and obs_def_mod.f90 source code files. The rest of DART may now be built.


    A little background for 'preprocess'


    IMPORTANT: Since each 'observation kind' may require different amounts of metadata to be read or written; any routine to read or write an observation sequence must be compiled with support for those particular observations. The supported observations are listed in the input.nml&obs_kind_nml block. This is the whole point of the 'preprocess' process ...


    [top]


    An overview of the observation sequence


    Observation sequences are complicated, there's just no better way to describe it. Trying to automatically accomodate a myriad of observation file formats, structure, and metadata is simply not an easy task. For this reason, DART has its own format for observations and a set of programs to convert observations from their original formats to DART's format. There are definitely some things to know ...

    An obs_seq.in file actually contains no observation quantities. It may be best thought of as a perfectly-laid-out notebook - just waiting for an observer to fill in the actual observation quantities. All the rows and columns are ready, labelled, and repeated for every observation time and platform. This file is generally the start of a "perfect model" experiment. Essentially, one instance of the model is run through perfect_model_obs - which applies the appropriate forward operators to the model state and 'writes them down' in our notebook. The completed notebook is renamed obs_seq.out.

    An obs_seq.out file contains a linked list of observations - potentially (and usually) observations from different platforms and of different quantities - each with their own error characteristics and metadata. These files arise from running perfect_model_obs OR from any number of converter programs. The creation of observation sequences from real observations is not automatic and an email to the DART team asking for advice for your specific types of observations is perfectly within reason.

    There is something called an obs_seq.final file - which contains everything in the obs_seq.out file as well as a few additional 'copies' of the observation. Remember, DART is an ensemble algorithm. Each ensemble member must compute its own estimate of the observation for the algorithm. The obs_seq.final file may contain each of these estimates (namelist controlled). Minimally, the mean and spread of the ensemble estimates is recorded in the obs_seq.final file. The best method of determining the performance of your 'real world' experiment is to compare in observation-space since we can never know the model state that perfectly represents the real world.

    IMPORTANT: Since each 'observation kind' may require different amounts of metadata to be read or written; any routine to read or write an observation sequence must be compiled with support for those particular observations. The supported observations are listed in the input.nml&obs_kind_nml block. This is the whole point of the 'preprocess' process ...


    observation sequence file structure

    explanationobs_seq.outobs_seq.final
    There are extensible parts of the observation sequence file; for example, the number of observation kinds contained in the file, whether the locations have 1 or more components, how many quality control values are available for each observation, where those quality control values come from, how many 'copies' of each observation there are ... et cetera. The images to the right are links to full-size images. The structure of an obs_seq.out file The structure of an obs_seq.final file

    [top]


    Creating observations and sequences.


    It is strongly encouraged that you use a single observation to test a new model implementation.

    Experience has shown that starting 'simple' is the fastest way to good results. Starting with a single observation will exercise a sufficient portion of the procedure and provide insight into where to spend more effort. Starting with a single synthetic observation will allow you to focus on the more interesting parts of the DART scheme without getting bogged down in the world of observation data formats.


    Creating a synthetic observation sequence.


    There are several steps to create an observation sequence file, which follows directly from the modular nature of the DART programming philosophy.

    1. Decide what observations you want to investigate and edit the input.nml&obs_kind_nml block.
    2. Build and run preprocess to create code that supports the observations you want.
    3. Build and run create_obs_sequence to define the specifics about the observation you want.
    4. Build and run create_fixed_network_sequence to replicate those specifics through time.
    5. Build and run perfect_model_obs to create an observation consistent with the model state and specified error distribution at the requested times and locations.

    Example: generating observations for the Lorenz '63 model.


    1) There are no 'real' observations for the Lorenz '63 model, so the appropriate namelist settings are:


    &obs_kind_nml
    assimilate_these_obs_types = 'RAW_STATE_VARIABLE' /

    &preprocess_nml
    input_obs_def_mod_file = '../../../obs_def/DEFAULT_obs_def_mod.F90',
    output_obs_def_mod_file = '../../../obs_def/obs_def_mod.f90',
    input_obs_kind_mod_file = '../../../obs_kind/DEFAULT_obs_kind_mod.F90',
    output_obs_kind_mod_file = '../../../obs_kind/obs_kind_mod.f90',
    input_files = '../../../obs_def/obs_def_1d_state_mod.f90' /

    2) Run preprocess in the normal fashion.


    3) create_obs_sequence creates an observation set definition (typically named set_def.out), the time-independent part of an observation sequence. It may help to think of it as trying to define what sorts of observations will be taken at one 'reading' ... you walk out to the box and take temperature, humidity, and wind observations all at the same time and place, for example. You can think of it as one page in an observer's notebook, and only contains the location, type, and observational error characteristics (normally just the diagonal observational error variance) for a related set of observations. There are no actual observation values, nor are there any times associated with the definition. The program is interactive and queries the user for the information it needs. Begin by creating a minimal observation set definition in which each of the 3 state variables of L63 is directly observed with an observational error variance of 1.0 for each observation. To do this, use the following input sequence (the text including and after # is a comment and does not need to be entered):

    The following is a screenshot (much of the verbose logging has been left off for clarity), the user input looks like this.


       [unixprompt]$ ./create_obs_sequence
        Starting program create_obs_sequence
        Initializing the utilities module.
        Trying to log to unit   10
        Trying to open file dart_log.out
        Registering module :
        $url: http://squish/DART/trunk/utilities/utilities_mod.f90 $
    
        { ... }
    
        Input upper bound on number of observations in sequence
       4
        Input number of copies of data (0 for just a definition)
       0
        Input number of quality control values per field (0 or greater)
       0
        input a -1 if there are no more obs
       0
        Registering module :
        $url: http://squish/DART/trunk/obs_def/DEFAULT_obs_def_mod.F90 $
        { ... }
        initialize_module obs_kind_nml values are
        -------------- ASSIMILATE_THESE_OBS_TYPES --------------
        RAW_STATE_VARIABLE
        -------------- EVALUATE_THESE_OBS_TYPES --------------
        ------------------------------------------------------
             Input -1 * state variable index for identity observations
             OR input the name of the observation kind from table below:
             OR input the integer index, BUT see documentation...
               1 RAW_STATE_VARIABLE
       -1
        input time in days and seconds
       0 0
        Input error variance for this observation definition
       1.0
        input a -1 if there are no more obs
       0
    
        { this gets repeated ... until you tell it to stop ... }
    
        input a -1 if there are no more obs
       -1
        Input filename for sequence (  set_def.out   usually works well)
        set_def.out 
        write_obs_seq  opening formatted file set_def.out
        write_obs_seq  closed file set_def.out
       

    Rest assured that if you requested to assimilate more realistic observation types, you will be queried for appropriate information by create_obs_sequence. Below is a table that explains all of the input you should need to supply for observations of the L63 model state.


    4 # upper bound on num of observations in sequence
    0 # number of copies of data (0 for just a definition)
    0 # number of quality control values per field (0 or greater)
    0 # -1 to exit/end observation definitions
     
    -1 # observe state variable 1
    0   0 # time -- days, seconds
    1.0 # observational variance
    0 # -1 to exit/end observation definitions
     
    -2 # observe state variable 2
    0   0 # time -- days, seconds
    1.0 # observational variance
    0 # -1 to exit/end observation definitions
     
    -3 # observe state variable 3
    0   0 # time -- days, seconds
    1.0 # observational variance
    -1 # -1 to exit/end observation definitions
     
    set_def.out     # Output file name

    4) create_fixed_network_sequence takes the observation set definition and repeats it in time, essentially making multiple pages in our notebook. Again, the program is interactive and queries the user for information. You should be able to simply follow the prompts. The table below represents the input needed for the L63 example:


    set_def.out # Input observation set definition file
    1 # Regular spaced observation interval in time
    1000 # 1000 observation times
    0, 43200 # First observation after 12 hours (0 days, 12 * 3600 seconds)
    0, 43200 # Observations every 12 hours
    obs_seq.in # Output file for observation sequence definition

    5) perfect_model_obs advances the model from the state defined by the initial conditions file specified in the input.nml and 'applies the forward operator' to harvest observations to fill in the observation sequence specified in obs_seq.in. The observation sequence finally has values for the observations and is saved in a file generally named obs_seq.out. perfect_model_obs is namelist-driven, as opposed to the previous two (whose input is a lot harder to specify in a namelist). Take a look at (and modify if you like) the input.nml&perfect_model_obs_nml section of the namelist.

    The End. Not only should you have an observation sequence file (usually obs_seq.out) , you also have a file containing the exact evolution of the model consistent with those observations - the True_State.nc.


    [top]


    Real Observations - Converting to a DART-compatible format.


    Real observations come in a mind-boggling diversity of formats. We have converters for some formats in the DART/observations directory. Many of the formats require their own libraries (like HDF), and require intimate knowledge of the data format to extract the portions required for the DART observation sequence file. Please feel free to browse the converters and their companion documentation. Feel free to donate converters for formats we don't already support! We like that kind of stuff.

    The DART framework enforces a clean separation between observations and the models used for assimilation. The same observations can be used in any model which understands how to generate a value for the requested type of observation from the models' state-space values (i.e. the forward observation operator must exist - DART provides many for the most common state variables).

    In many cases, the original datasets are in a standard scientific format like netCDF, HDF, or BUFR, and library routines for those formats can be used to read in the original observation data. The DART software distribution includes Fortran subroutines and functions to help create a sequence of observations in memory, and then a call to the DART observation sequence write routine will create an entire obs_seq file in the correct format.

    In many cases, a single, self-contained program can convert directly from the observation location, time, value, and error into the DART format. In other cases, especially those linking with a complicated external library (e.g. BUFR), there is a two-step process with two programs and an ASCII intermediate file. We are currently leaning towards single-step conversions but either approach can be used for new programs.

    The DART system comes with several types of location modules for computing distances appropriately. The two most commonly used are for data in a 1D system and for data in a 3D spherical coordinate system. All the programs in the DART/observations directory assume the location/threed_sphere/location_mod.f90 3D sphere location module is being used.

    With the myriad of observation file formats, HDF, Grib, BUFR, netCDF, ... we simply have not had the time nor need to support all of them. The converters are a work in progress. There are currently about 10 other observation sources and types which we are in the process of collecting information and conversion programs for and which will eventually be added to this directory. In the meantime, if you have converters for data or interest in something that is not in the repository, please email the DART group. Your best bet is to contact our group at dart@ucar.edu with a specific request and we can steer you to the most similar process.


    [top]


    Manipulating observation sequences.


    First and foremost, check out the DART/obs_sequence/obs_sequence_tool.html document for detailed information and examples.

    obs_sequence_tool is the primary tool for manipulating observation sequence files. Observations sequence files are linked lists of observations organized by time. That is to say, the observations may appear in any order in the file, but traversing the linked list will result in observations ordered by time. There are tools for querying the linked list to extract the 'keys' to the list items for a particular date range, for example. obs_sequence_tool can be used to combine observation sequences, convert from ASCII to binary or vice-versa, extract a subset of observations, etc.

    For testing, it is terribly useful to extract a small number of observations (like ONE) from an existing observation sequence file.


    [top]


    The difference between observation TYPE and observation KIND.


    Broadly speaking, observation TYPES are specific instances of a generic observation KIND. The distinction is useful for several reasons, not the least of which is to evaluate observation platforms. Zonal wind observations from QuikSCAT vs. radiosondes, for example. They are both observations of zonal winds (what we call KIND_U_WIND_COMPONENT), but they are different observation TYPES; QKSWND_U_WIND_COMPONENT, and RADIOSONDE_U_WIND_COMPONENT, respectively. The forward observation operators are implemented based on observation KIND. When requested, the model generates a KIND_U_WIND_COMPONENT, it doesn't need to know that it will be compared to a QuikSCAT value or a radiosonde value.

    However, it is usually scientifically very interesting to be able to compare the assimilations one TYPE of observation vs. another. One observation sequence file can have lots of types of observations; DART has the capability to assimilate (or evaluate) any combination of observation types without getting bogged down in dataset management. The same observation sequence can be used for experiments that include/exclude certain observation types - ensuring that you are performing the experiment you THINK you are performing ...


    Adding support for a new observation TYPE.


    DART/obs_def/obs_def_mod.html is the source for detailed information.


    [top]


    The Data Assimilation Research Testbed - DART Tutorial


    DART comes with an extensive set of tutorial materials, working models of several different levels of complexity, and data to be assimilated. It has been used in several multi-day workshops and can be used as the basis to teach a section on Data Assimilation. Download the DART software distribution and look in the tutorial subdirectory for the pdf and framemaker source for each of the 22 tutorial sections. The most recent versions of the tutorial are always provided below.

    Browsing the tutorial is worth the effort.
    Taking the tutorial is FAR better!


    1. Section 1 [pdf] Filtering For a One Variable System.
    2. Section 2 [pdf] The DART Directory Tree.
    3. Section 3 [pdf] DART Runtime Control and Documentation.
    4. Section 4 [pdf] How should observations of a state variable impact an unobserved state variable? Multivariate assimilation.
    5. Section 5 [pdf] Comprehensive Filtering Theory: Non-Identity Observations and the Joint Phase Space.
    6. Section 6 [pdf] Other Updates for An Observed Variable.
    7. Section 7 [pdf] Some Additional Low-Order Models.
    8. Section 8 [pdf] Dealing with Sampling Error.
    9. Section 9 [pdf] More on Dealing with Error; Inflation.
    10. Section 10 [pdf] Regression and Non-linear Effects.
    11. Section 11 [pdf] Creating DART Executables.
    12. Section 12 [pdf] Adaptive Inflation.
    13. Section 13 [pdf] Hierarchical Group Filters and Localization.
    14. Section 14 [pdf] DART Observation Quality Control.
    15. Section 15 [pdf] DART Experiments: Control and Design.
    16. Section 16 [pdf] Diagnostic Output.
    17. Section 17 [pdf] Creating Observation Sequences.
    18. Section 18 [pdf] Lost in Phase Space: The Challenge of Not Knowing the Truth.
    19. Section 19 [pdf] DART-Compliant Models and Making Models Compliant.
    20. Section 20 [pdf] Model Parameter Estimation.
    21. Section 21 [pdf] Observation Types and Observing System Design.
    22. Section 22 [pdf] Parallel Algorithm Implementation.
    23. Carbon Tutorial [pdf] A Simple 1D Advection Model.

    [top]


    Adding a model to DART - Overview


    DART is designed to work with many models without modifications to the DART routines or the model source code. DART can 'wrap around' your model in two ways. One can be used if your model can be called as a subroutine, the other is for models that are separate executables. Either way, there are some steps that are common to both paths.

    Please be aware that several of the high-order models (CAM and WRF, in particular) have been used for years and their scripts have incorporated failsafe procedures and restart capabilities that have proven to be useful but make the scripts complex - more complex than need be for the initial attempts. Truly, some of the complexity is no longer required for available platforms. Then again, we're not running one instance of a highly complicated computer model, we're running N of them.


    The basic steps to include your model in DART

    1. Copy the template directory and files to your own DART model directory.
    2. Modify the model_mod.f90 file to return specifics about your model. This module MUST contain all the required interfaces (no surprise) but it can also contain many more interfaces as is convenient.
    3. [optional step] Modify the matlab routines to know about the specifics of the netCDF files produces by your model (sensible defaults, for the most part.)


    4. If your model is not subroutine-callable, there is extra work to be done. There are several examples to raid, but it helps to know which existing model has a strategy that is most similar to yours. More on that later.

    5. Modify shell_scripts/advance_model.csh to: collect all the input files needed to advance the model into a clean, temporary directory, convert the state vector file into input to your model, run your model, and convert your model output to the expected format for another assimilation by DART. We have examples - some that use the following support routines.
      1. Create a routine or set of routines to take a DART array and create input files for your model. This is frequently done by a program called dart_to_model.f90, but you can do it any way you like. It is strongly suggested that you use the DART read mechanism for the restart file - namely assim_model_mod.f90:aread_state_restart()
      2. Modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. It may be convenient to do this in dart_to_model.f90.
      3. Run the model (you may need to watch the MPI syntax)
      4. Create a routine or set of routines to take your model output files and create a DART restart file. This is frequently done by a program called model_to_dart.f90. It is strongly suggested that you use the DART write mechanism for the restart file - namely assim_model_mod.f90:awrite_state_restart()


    6. If a single instance of your model needs to advance using all the MPI tasks, there is one more script that needs to work - run_filter.csh.

    7. Modify shell_scripts/run_filter.csh to: do everything under the sun and then some


    Test ...


    Generally, it is a good strategy to use DART to create a synthetic observation sequence with ONE observation location - and ONE observation type - for several assimilation periods. With that, it is possible to run perfect_model_obs and then filter without having to debug too much stuff at once. A separate document will address how to test your model with DART.


    Programming style


    #1 Don't shoot the messenger. We have a lot of experience trying to write portable/reproducible code and offer these suggestions. All of these suggestions are for the standalone DART components. We are not asking you to rewrite your model. If your model is a separate executable, leaving it untouched is fine. Writing portable code for the DART components will allow us to include your model in the nightly builds and reduces the risk of us making changes that adversely affect the integration with your model. There are some routines that have to play with the core DART routines, these are the ones we are asking you to write using these few simple guidelines.

    • Use explicit typing, do not throw the 'autopromote' flag on your compiler.
    • Use the intent() attribute.
    • Use the use, xxx_mod, only : bob, sally statements for routines from other modules. This really helps us track down things and ensures you're using what you think you're using.
    • Use Fortran namelists for I/O if possible.
    • Check out the existing parameters/routines in common/types_mod.f90, utilites/utilities_mod.f90, and time_manager/time_manager_mod.f90. You are free to use these and are encouraged to do so. No point reinventing the wheel and these routines have been tested extensively.
    Hopefully, you have no idea how difficult it is to build each model with 'unique' compile options on N different platforms. Fortran90 provides a nice mechanism to specify the type of variable, please do not use vendor-specific extensions. (To globally autopromote 32bit reals to 64bit reals, for example. That is a horrible thing to do, since vendors are not consistent about what happens to explicitly-typed variables. Trust me. They lie. It also defeats the generic procedure interfaces that are designed to use a single interface as a front-end to multiple 'type-specific' routines.) Compilers do abide by the standard, however, so DART code looks like:

    character(len=8) :: crdate
    integer, dimension(8) :: values
    ...
    real(r4) :: a,b
    real(r8) :: bob
    integer :: istatus, itype
    ...
    real(r8), intent(in) :: x(:)
    type(location_type), intent(in) :: location
    integer, intent(in) :: itype
    integer, intent(out) :: istatus
    real(r8), intent(out) :: obs_val

    depending on the use. The r4 and r8 types are explicitly defined in DART/common/types_mod.f90 to accurately represent what we have come to expect from 32bit and 64bit floating point real variables, respectively. If you like, you can redefine r8 to be the same as r4 to shrink your memory requirement. The people who run with WRF frequently do this. Do not redefine the digits12 parameter, that one must provide 64bit precision, and is used in precious few places.


    Adding a model to DART - Specifics


    If your model is a separate executable, there is some flexibility to provide the required interfaces and it would be wise to look at the heavily commented template script DART/models/templates/shell_scripts/advance_model.csh and then a few higher-order models to see how they do it. Become familiar with DART/doc/html/mpi_intro.html (DART's use of MPI), DART/doc/html/filter_async_modes.html, and the filter namelist parameter async in filter.html.


    1. Copying the template directory


    A little explanation/motivation is warranted. If the model uses the standard layout, it is much easier to include the model in the nightly builds and testing. For this reason alone, please try to use the recommended directory layout. Simply looking at the DART/models directory should give you a pretty good idea of how things should be laid out. Copy the template directory and its contents. It is best to remove the (hidden) subversion files to keep the directory 'clean'. The point of copying this directory is to get a model_mod.f90 that works as-is and you can modify/debug the routines one at a time.

    ~/DART/models % cp -r template mymodel
    ~/DART/models % find mymodel -name .svn -print
    mymodel/.svn
    mymodel/shell_scripts/.svn
    mymodel/work/.svn
    ~/DART/models % rm -rf `find mymodel -name .svn -print`
    ~/DART/models % find mymodel -name .svn -print
    ~/DART/models %

    The destination directory (your model directory) should be in the DART/models directory to keep life simple. Moving them around will cause problems for the work/mkmf_xxxxx configuration files. Each model directory should have a work and shell_scripts directories, and may have a matlab directory, a src directory, or anything else you may find convenient.

    Now, you must change all the work/path_names_xxx file contents to reflect the location of your model_mod.f90.


    2. model_mod.f90


    We have templates, examples, and a document describing the required interfaces in the DART code tree - DART/models/model_mod.html. Every(?) user-visible DART program/module has a matching piece of documentation that is distributed along with the code. The DART code tree always has the most current documentation.

    Check out time_manager_mod.f90 and utilities_mod.f90 for general-purpose routines ...

    Use Fortran namelists for I/O if possible.

    Modify the model_mod.f90 file to return specifics about your model. This module MUST contain all the required interfaces (no surprise) but it can also contain many more interfaces as is convenient. This module should be written with the understanding that print statements and error terminations will be executed by multiple processors/tasks. To restrict print statements to be written once (by the master task), it is necessary to preface the print as in this example:
    if (do_output()) write(*,*)'model_mod:namelist cal_NML',startDate_1,startDate_2


    Required Interfaces in model_mod.f90


    No matter the complexity of the model, the DART software requires a few interface routines in a model-specific Fortran90 module model_mod.f90 file. The models/template/model_mod.f90 file has extended comment blocks at the heads of each of these routines that go into much more detail for what is to be provided. You cannot change the types or number of required arguments to any of the required interface routines. You can add optional arguments, but you cannot go back throught the DART tree to change the gazillion calls to the mandatory routines. It is absolutely appropriate to look at existing models to get ideas about how to implement the interfaces. Finding a model implementation that is functionally close to yours always helps.

    As of December 2008, the table of the mandatory interfaces and programming degree-of-difficulty is:

    subroutine callable separate executable routine description
    easyeasy get_model_size This function returns the size of all the model variables (prognostic or diagnosed or ...) that are packed into the 1D DART state vector. That is, it returns the length of the DART state vector as a single scalar integer.
    dependstrivial adv_1step For subroutine-callable models, this routine is the one to actually advance the model 1 timestep (see models/bgrid_solo/model_mod.f90 for an example). For non-subroutine-callable models, this is a NULL interface. Easy.
    dependsdepends get_state_meta_data This routine takes as input an integer into the DART state vector and returns the associated location and (optionally) variable type from obs_kind/obs_kind_mod.f90. (See models/*/model_mod.f90 for examples.) This generally requires knowledge of how the model state vector is packed into the DART array, so it can be as complicated as the packing.
    dependshard model_interpolate This is one of the more difficult routines. Given a DART state vector, a location, and a desired generic 'kind' (like KIND_SURFACE_PRESSURE, KIND_TEMPERATURE, KIND_SPECIFIC_HUMIDITY, KIND_PRESSURE, ... ); return the desired scalar quantity and set the return status accordingly. This is what enables the model to use observation-specific 'forward operators' that are part of the common DART code.
    easyeasy get_model_time_step This routine returns the smallest increment in time (in seconds) that the model is capable of advancing the state in a given implementation. For example, the dynamical timestep of a model is 20 minutes, but there are reasons you don't want to (or cannot) restart at this interval and would like to restart AT MOST every 6 hours. For this case, get_model_time_step should return 21600, ie 6*60*60. This is also interpreted as the nominal assimilation period. This interface is required for all applications.
    easyeasy end_model Performs any shutdown and cleanup needed. Good form would dictate that you should deallocate any storage allocated when you instantiated the model (from static_init_model, for example).
    dependsdepends static_init_model Called to do one-time initialization of the model. This generally includes setting the grid information, calendar, etc.
    trivialtrivial init_time Returns a time that is somehow appropriate for starting up a long integration of the model IFF the namelist parameter start_from_restart = .false. for the program perfect_model_obs. If this option is not to be used in perfect_model_obs, this can be a NULL interface.
    easyeasy init_conditions Companion interface to init_time. Returns a model state vector that is somehow appropriate for starting up a long integration of the model. Only needed IFF the namelist parameter start_from_restart = .false. for the program perfect_model_obs.
    trivial-difficult   trivial-difficult   nc_write_model_atts This routine is used to write the model-specific attributes to the netCDF files containing the prior and posterior states of the assimilation. The subroutine in the models/template/model_mod.f90 WILL WORK for new models but does not know anything about prognostic variables or geometry or ... Still, it is enough to get started without doing anything. More meaningful coordinate variables etc. are needed to supplant the default template. This can be as complicated as you like - see existing models for examples.
    trivial-difficult   trivial-difficult   nc_write_model_vars This routine is responsible for writing the DART state vector -or- the prognostic model variables to the output netCDF files. If the namelist parameter output_state_vector == .false. this routine is responsible for partitioning the DART state vector into the appropriate netCDF pieces (i.e. the prognostic model variables). The default routine will simply blast out the entire DART state vector into a netCDF variable called 'state'.
    dependstrivial-difficult   pert_model_state This routine is used to generate initial ensembles. This may be a NULL interface if you can tolerate the default perturbation strategy of adding noise to every state element or if you generate your own ensembles outside the DART framework. There are other ways of generating ensembles ... climatological distributions, bred singular vectors, voodoo ...
    trivialtrivial get_close_maxdist_init   This routine performs the initialization for the table-lookup routines that accelerate the distance calculations. This routine is closely tied to the type of location module used by the model and is frequently (universally?) simply a 'pass-through' routine to a routine of the same name in the location module. There is generally no coding that needs to be done, but the interface must exist in model_mod.f90
    trivialtrivial get_close_obs_init This routine performs the initialization for the get_close accelerator that depends on the particular observation. Again, this is generally a 'pass-through' routine to a routine of the same name in the location module.
    trivialtrivial get_close_obs This is the routine that takes a single location and a list of other locations, returns the indices of all the locations close to the single one along with the number of these and the distances for the close ones. Again, this is generally a 'pass-through' routine to a routine of the same name in the location module.
    easyeasy ens_mean_for_model This routine simply stores a copy of the ensemble mean of the state vector within the model_mod. The ensemble mean may be needed for some calculations (like converting model sigma levels to the units of the observation - pressure levels, for example).

    3. providing matlab support


    Since this is an optional step, it will be covered in a separate document.


    If your model is subroutine-callable - you're done!





    The Big Picture for models advanced as separate executables.


    The normal sequence of events is that DART reads in its own restart file (do not worry about where this comes from right now) and eventually determines it needs to advance the model. DART needs to be able to take its internal representation of each model state vector, the valid time of that state, and the amount of time to advance the state - and communicate that to the model. When the model has advanced the state to the requested time, the output must be ingested by DART and the cycle begins again. DART is entirely responsible for reading the observations and there are several programs for creating and manipulating the observation sequence files.

    There are a couple of ways to exploit parallel architectures with DART, and these have an immediate bearing on the design of the script(s) that control how the model instances (each model copy) are advanced. Perhaps the conceptually simplest method is when each model instance is advanced by a single processor element. DART calls this async = 2. It is generally efficient to relate the ensemble size to the number of processors being used.

    The alternative is to advance every model instance one after another using all available processors for each instance of the model. DART calls this async = 4, and requires an additional script. For portability reasons, DART uses the same processor set for both the assimilation and the model advances. For example, if you advance the model with 96 processors, all 96 processors will be employed to assimilate.




    4. advance_model.csh


    This script is invoked in one of two ways: 1) if filter uses a system() call, or 2) if run_filter.csh makes the call. Either way there are three arguments.

    1. the process number of the caller - could be the master task ID (zero) or (especially if async = 2) a process id that gets related to the copy. When multiple copies are being advanced simultaneously, each of the advances happens in its own run-time directory.
    2. the number of state copies belonging to that process
    3. the name of the (ASCII) filter_control_file for that process. The filter_control file contains the following information (one per line): the ensemble member, the name of the input file (containing the DART state vector), and the name of the output file from the model containing the new DART state vector. For example,
      1
      assim_model_state_ic.0001
      assim_model_state_ud.0001
      2
      assim_model_state_ic.0002
      assim_model_state_ud.0002
      ...


    async = 2 ... advancing many copies at the same time


    Modify shell_scripts/advance_model.csh to:

    1. Collect all the input files needed to advance the model into a clean, temporary directory.
    2. Determine how many tasks you have, and how many ensemble members you have. Determine how many 'batches' of ensemble members must be done to advance all of them. With 20 tasks and 80 ensemble members, you will need to loop 4 times, for example. clean, temporary directory
    3. and loop over the following three steps - each loop advances one ensemble member
    4. convert the DART state vector file into input for your model,
    5. run your model, and
    6. convert your model output to the file with the expected format for another assimilation by DART.
    During this initial phase, it may be useful to _leave_ the temporary directory


    async = 4 ... advancing each copy one at a time


    In addition to modifying shell_scripts/advance_model.csh as described above, you must also modify shell_scripts/run_filter.csh in the following way: THIS PART NEEDS TO BE FILLED IN


    5. Converting DART output to input for your model.


    After DART has assimilated the observations and created new (posterior) states, it is necessary to reformat those posteriors into input for the model. Fundamentally, you are unpacking the DART state vector and putting the pieces back into whatever portion of your model initial conditions file is appropriate. Frequently this is done by a program called dart_to_model.f90.

    Put another way; Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. Frequently this is done by dart_to_model.f90. The DART array has a header that contains the 'advance-to' time as well as the 'valid' time of the DART array. The times are encoded in DART's time representation. Interpretation of these times by your model will be necessary and you will need to feed these times to your model in some automated fashion.

    You will also need to create/modify a mkmf_dart_to_model and path_names_dart_to_model specific to your model.


    6. Modify the start/stop control of your model run.


    Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. These routines are called in the advance_model.csh script. Every model is controlled differently, so writing detailed descriptions here is pointless.


    7. Convert your model output to DART input.


    After your model has advanced its states to the target time, it is necessary to convey the new state information to DART. The preferred name for this is dart_to_model.f90. This is fundamentally the inverse of model_to_dart.f90. Rip out the bits of the state vector you want to paste into a vector for DART, prepend the valid_time in the approved DART format and you're good to go. If you pack the bits into a DART state vector, there are native DART routines to write out the state vector. This ensures that DART will be able to read what you've written, and insulates you from having to worry about any internal file format changes we might make.



    [top]



    Adding Matlab® support for your own model - under construction.

    Only needed for state-space diagnostics.
    Define a matlab structure with required elements.



    [top]



    Testing Strategies - under construction

    Check the converter programs.
    Check the model advance control.
    Start with one observation in a known location, with a known value and error specification.
    Perform a 'perfect model' experiment for a few timesteps.
    ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc
    ncview Innov.nc



    [top]



    Was the Assimilation Effective? - under construction

    OK - so it didn't blow up ... what now?
    Apply the same diagnostics as the tutorial.
    Check the evolution of the ensemble spread.
    Check the evolution of the ensemble rmse.
    Check the total spread.



    [top]



    non-Matlab® diagnostics - under construction.

    get the ncl stuff from David Dowell.
    'roll your own' netcdf ...
    ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc
    ncview Innov.nc
    explore the obs_diag_output.nc on your own ... explanation of copies, etc.



    [top]


    DART observation-space diagnostics.


    It is required to post-process the obs_seq.final file(s) with obs_diag to generate a netCDF file containing accumulated diagnostics for specified regions, etc. Since the experiment information (assimilation interval, assimilating model, etc.) are not recorded in the obs_seq.final file, the obs_diag_nml namelist has a section that allows specification of the necessary quantities.

    The following quantities are normally diagnosed:


    Nposs the number of observations for each assimilation period;
    Nused the number of observations successfully assimilated each assimilation period;
    NbadQC the number of observations that had an undesirable (original) QC value;
    NbadIZ the number of observations that had an undesirable Innovation Z;
    NbadUV the number of velocity observations that had a matching component that was not assimilated;
    NbadLV the number of observations that were above or below the highest or lowest model level, respectively;
    rmse the rmse of the ensemble;
    bias the bias of the ensemble (forecast-observation);
    spread the spread of the ensemble; and the
    totalspread pooled spread of the observation (knowing its observational error) and the ensemble.

    The observation-space functions are in the DART/diagnostics/matlab directory. Once you have processed the obs_seq.final files into a single obs_diag_output.nc, you can use that as input to the following plotting routines:


    plot_evolution.m plots the temporal evolution of any of the quantities above for each variable for specified levels. The number of observations possible and used are plotted on the same axis.
    plot_rmse_xxx_evolution.m    same as plot_evolution.m but will overlay rmse on the same axis.
    plot_profile.m plots the spatial and temporal average of any specified quantity as a function of height.
    plot_rmse_xxx_profile.m same as plot_profile.m with an overlay of rmse.
    plot_bias_xxx_profile.m same as plot_profile.m with an overlay of bias.

    [coming soon] Demo plotting locations, etc, verifying obs sequences ...


    [top]


    DART state-space diagnostics.


    There are a set of Matlab® functions to help explore the assimilation performance in state-space, which is very useful for OSSE's (i.e. when you know the true model state). The general guideline here is that anything that computes an 'error' requires the truth. There are some functions that work without a true state.

    In order to use any of these functions, the scripts need to know how to interpret the layout of the netCDF file - which is usually model-dependent. See the section on Adding Matlab® support for your own model if you are not using one of the supported DART models.

    The state-space functions are in the DART/matlab directory:

    plot_bins.m plots the rank histograms for a set of state variables.
    plot_total_err.m plots the evolution of the error (un-normalized) and ensemble spread of all state variables.
    plot_ens_mean_time_series.m    plots the evolution of a set of state variables - just the ensemble mean (and Truth, if available).
    plot_ens_time_series.m plots the evolution of a set of state variables - all ensemble members, the ensemble mean (and Truth, if available).
    plot_sawtooth.m plots the trajectory of any set of state variables highlighting the assimilation 'increment'.
    plot_phase_space.m plots the trajectory of any two or three state variables. The classic attractor diagram.
    plot_ens_err_spread.m plots the evolution of the ensemble error and spread for a select set of state variables.
    plot_correl.m plots the correlation through time of a state variable and the rest of the state.
    plot_jeff_correl.m plots the correlation through time of a state variable at a particular time and any state variable.
    plot_smoother_err.m plots the error of the ensemble smoother - which uses observations in the future as well as the present.
    plot_var_var_correl.m plots the rank histograms

    [top]



    non-Matlab® diagnostics - under construction.

    get the ncl stuff from David Dowell.
    'roll your own' netcdf ...
    ncdiff Posterior_Diag.nc Prior_Diag.nc Innov.nc
    ncview Innov.nc
    explore the obs_diag_output.nc on your own ... explanation of copies, etc.



    [top]



    Examples - under construction

    1. observation location/value plots
    2. localization value domain widget?
    3. namelist settings for damped adaptive spatially-varying group filter


    [top]



    Customizing DART - under construction.

    Modify the code to your heart's content.
    'svn revert' cures all ...



    [top]



    Adding your efforts to DART.

    Please let us know if you have suggestions or code to contribute to DART. We're a small group, but we are willing to listen and will make every endeavor to incorporate improvements to the code. Email us at dart@ucar.edu.



    [top]



    Assimilation Algorithms employed in DART - under construction.

    explain namelist settings for EAKF, EnKF, particle filter, ...



    [top]



    namelists - under construction

    explain that one namelist can 'do it all', what parts of the namelist are 'important' ... maybe a namelist with links to the relevant documentation ...



    [top]