'Perfect Model' observation experiments (also known as)
Observation System Simulation Experiment (OSSE)


Once a model is compatible with the DART facility all of the functionality of DART is available. This includes 'perfect model' experiments (also called Observing System Simulation Experiments - OSSEs). Essentially, the model is run forward from a known state and, at predefined times, an observation forward operator is applied to the model state to harvest synthetic observations. This model trajectory is known as the 'true state'. The synthetic observations are then used in an assimilation experiment. The assimilation performance can then be evaluated precisely because the true state (of the model) is known. Since the same forward operator is used to harvest the synthetic observations as well as during the assimilation, the 'representativeness' error of the assimilation system is not an issue.

There are a set of MATLAB® functions to help explore the assimilation performance in state-space as well as in observation-space. An OSSE is explored in depth in our Lorenz '96 example.


Perfect Model Experiment Overview

There are four fundamental steps to running an OSSE from within DART:

  1. Create a blueprint of what, where, and when you want observations. Essentially, define the metadata of the observations without actually specifying the observation values. The default filename for the blueprint is obs_seq.in . For simple cases, this is just running create_obs_sequence and create_fixed_network_sequence, more in-depth solutions are presented below.
  2. Harvest the synthetic observations from the true model state by running perfect_model_obs to advance the model from a known initial condition and apply the forward observation operator based on the observation 'blueprint'. The observation will have noise added to it based on a draw from a random normal distribution with the variance specified in the observation blueprint. The noise-free 'truth' and the noisy 'observation' are recorded in the output observation sequence file. The entire time-history of the true state of the model is recorded in True_State.nc . The default filename for the 'observations' is obs_seq.out .
  3. Assimilate the synthetic observations with filter in the usual way. The prior/forecast states are preserved in Prior_Diag.nc and the posterior/analysis states are preserved in Posterior_Diag.nc . The default filename for the file with the observations and (optionally) the ensemble estimates of the observations is obs_seq.final .
  4. Check to make sure the assimilation was effective! Ensemble DA is not a black box! YOU must check to make sure you are making effective use of the information in the observations!


1. Defining the observation metadata - the 'blueprint'.


There are lots of ways to define an observation sequence that DART can use as input for a perfect model experiment. If you have observations in DART format already, you can simply use them. If you have observations in one of the formats already supported by the DART converters (check DART/observations/observations.html), convert it to a DART observation sequence. You may need to use the obs_sequence_tool to combine multiple observation sequence files into observation sequence files for the perfect model experiment. Any existing observation values and quality control information will be ignored by perfect_model_obs - only the time and location information are used. In fact, any and all existing observation and QC values will be removed.

GENERAL COMMENT ABOUT THE INTERPLAY BETWEEN THE MODEL STOP/START FREQUENCY AND THE IMPACT ON THE OBSERVATION FREQUENCY: There is usually a very real difference between the dynamical timestep of the model and when it is safe to stop and restart the model. The assimilation window is (usually) required to be a multiple of the safe stop/start frequency. For example, an atmospheric model may have a dynamical timestep of a few seconds, but may be constrained such that it is only possible to stop/restart every hour. In this case, the assimilation window is a multiple of 3600 seconds. Trying to get observations at a finer timescale is not possible, we only have access to the model state when the model stops.

If you do not have an input observation sequence, it is simple to create one.

  1. Run create_obs_sequence to generate the blueprint for the types of observations and observation error variances for whatever locations are desired.
  2. Run create_fixed_network_seq to define the temporal distribution of the desired observations.

Both create_obs_sequence and create_fixed_network_seq interactively prompt you for the information they require. This can be quite tedious if you want a spatially dense set of observations. People have been known to actually write programs to generate the input to create_obs_sequence and simply pipe or redirect the information into the program. There are several examples of these in the models/bgrid_solo directory: column_rand.f90, id_set_def_stdin.f90, ps_id_stdin.f90, and ps_rand_local.f90 . Be advised that some observation types have different input requirements, so a 'one size fits all' program is a waste of time.


NOTE: only the observation kinds in the input.nml &obs_kind_nml:assimilate_these_obs_types,evaluate_these_obs will be available to the create_obs_sequence program.


DEVELOPERS TIP: You can specify 'identity' observations as input to perfect_model_obs. Identity observations are the model values AT the exact gridcell location, there is no interpolation at all. Just a straight table-lookup. This can be useful as you develop your model interfaces; you can test many of the routines and scripts without having a working model_interpolate().


More information about creating observation sequence files for OSSE's is available in the observation sequence discussion section.


2. Generating the true state and harvesting the observation values - perfect_model_obs


perfect_model_obs reads the blueprint and an initial state and applies the appropriate forward observation operator for each and every observation in the current 'assimilation window'. If necessary, the model is advanced until the next set of observations is desired. When it has run out of observations or reached the stop time defined by the namelist control, the program stops and writes out a restart file, a diagnostic file, the observation sequence file, and a log file. This is fundamentally a single deterministic forecast for 'as long as it takes' to harvest all the observations.


default filename format contents
perfect_restart   ASCII or binary  The DART model state at the end of the forecast. If the forecast needs to be lengthened, use this as the input. The format of the file is controlled by input.nml &assim_model_nml:write_binary_restart_files The first record is the valid time of the model. The rest is the model state at that time.
True_State.nc netCDF The DART model state at every assimilation timestep. This file has but one 'copy' - the truth. Dump the copy metadata and the time:
ncdump -v time,CopyMetaData True_State.nc
obs_seq.out ASCII or binary
DART-specific linked list
This file has the observations - the result of the forward observation operator. This observation sequence file has two 'copies' of the observation: the noisy 'copy' and the noise-free 'copy'. The noisy copy is designated as the 'observation', the noise-free copy is the truth. The observation-space diagnostic program obs_diag has special options for using the true copy instead of the observation copy. See the obs_diag.html for details.
dart_log.out ASCII The run-time output of perfect_model_obs .

Each model may define the assimilation window differently, but conceptually, all the observations plus or minus half the assimilation window are considered to be simultaneous and a single model state provides the basis for all those observations. For example: if the blueprint requires temperature observations every 30 seconds, the initial model time is noon (12:00) and the assimilation window is 1 hour; all the observations from 11:30 to 12:30 will use the same state as input for the forward observation operator. The fact that you have a blueprint for observations every 30 seconds means a lot of those observations may have the same value (if they are in the same location).


perfect_model_obs uses the input.nml for its control. A subset of the namelists and variables of particular interest for perfect_model_obs are summarized here. Each namelist is fully described by the corresponding module document.


&perfect_model_obs_nml  <--- link to the full namelist description!
   ...
   start_from_restart    = .true.            usually, but not always
   output_restart        = .true.            sure, why not
   init_time_days        = -1                negative means use the time in ...
   init_time_seconds     = -1                the 'restart_in_file_name' file
   first_obs_days        = -1                negative means start at the first time in ...
   first_obs_seconds     = -1                the 'obs_seq_in_file_name' file.
   last_obs_days         = -1                negative means to stop with the last ...
   last_obs_seconds      = -1                observation in the file.
   restart_in_file_name  = "perfect_ics"
   restart_out_file_name = "perfect_restart"
   obs_seq_in_file_name  = "obs_seq.in"
   obs_seq_out_file_name = "obs_seq.out"
   output_interval       = 1
   async                 = 0                 totally depends on the model
   adv_ens_command       = "./advance_ens.csh"       depends on the model
  /

&obs_sequence_nml
   write_binary_obs_sequence = .false.       .false. will create ASCII - easy to check.
  /

&obs_kind_nml
   ...
   assimilate_these_obs_types = 'RADIOSONDE_TEMPERATURE',
   ...                                       list all the synthetic observation
   ...                                       types you want
  /

&assim_model_nml
   ...
   write_binary_restart_files = .true.       your choice
  /

&model_nml
   ...
   time_step_days = 0,                       some models call this 'assimilation_period_days'
   time_step_seconds = 3600                  some models call this 'assimilation_period_seconds'
                                             use whatever value you want
  /

&utilities_nml
   ...
   termlevel   = 1                           your choice
   logfilename = 'dart_log.out'              your choice
  /

Executing perfect_model_obs


Since perfect_model_obs generally requires advancing the model, and the model may use MPI or require special ancillary files or forcing files or ..., it is not possible to provide a single example that will cover all possibilities. The subroutine-callable models (i.e. the low-order models) can run perfect_model_obs very simply:


./perfect_model_obs


3. Performing the assimilation experiment - filter


This step is done with the program filter, which also uses input.nml for input and run-time control. A successful assimilation will depend on many things: an approprite initial ensemble, monitoring and perhaps correcting the ensemble spread, localization, etc. It is simply not possible to design a one-size-fits-all system that will work for all cases. It is critically important to analyze the results of the assimilation and explore ways of making the assimilation more effective. The DART tutorial and the DART_LAB exercises are an invaluable resource to learn and understand how to determine the effectiveness of, and improve upon, an assimilation experiment. The concepts learned with the low-order models are directly applicable to the most complicated models.

It is important to remember that if filter 'terminates normally', it does not necessarily mean the assimilation was effective!

filter produces two state-space output diagnostic files (Prior_Diag.nc and Posterior_Diag.nc) which contains values of the ensemble mean, ensemble spread, perhaps the inflation values, and (optionally) ensemble members for the duration of the experiment. filter also creates an observation sequence file that contains the input observation information as well as the prior and posterior ensemble mean estimates of that observation, the prior and posterior ensemble spread for that observation, and (optionally), the actual prior and posterior ensemble estimates of that observation. Rather than replicate the observation metadata for each of these, the single metadata is shared for all these 'copies' of the observation. See An overview of the observation sequence for more detail. filter also produces a run-time log file that can greatly aid in determining what went wrong if the program terminates abnormally.

A very short description of some of the most important namelist variables is presented here. Basically, I am only discussing the settings necessary to get filter to run. I can guarantee these settings WILL NOT generate the BEST assimilation. Again, see the module documentation for a full description of each namelist.

&filter_nml  <--- link to the full namelist description!
   async                    = 0
   adv_ens_command          = "./advance_model.csh"
   ens_size                 = 40                 something ≥ 20, please
   start_from_restart       = .false.            .false. requires reading available input files
   output_restart           = .true.
   obs_sequence_in_name     = "obs_seq.out"      whatever you called the output from perfect_model_obs
   obs_sequence_out_name    = "obs_seq.final"
   restart_in_file_name     = "filter_ics"       the file (or base file name) of your ensemble
   restart_out_file_name    = "filter_restart"
   init_time_days           = -1                 the time in the restart file is correct
   init_time_seconds        = -1
   first_obs_days           = -1                 same interpretation as with perfect_model_obs
   first_obs_seconds        = -1
   last_obs_days            = -1                 same interpretation as with perfect_model_obs
   last_obs_seconds         = -1
   num_output_state_members = 10                 # of FULL DART model states to put in state-space output files
   num_output_obs_members   = 40                 # of ensemble member 'copies' of observation to save
   output_interval          = 1
   num_groups               = 1
   input_qc_threshold       =  4.0
   outlier_threshold        =  3.0               Observation rejection criterion!
   output_forward_op_errors = .false.
   output_timestamps        = .false.
   output_inflation         = .true.

   inf_flavor               = 0,                       0                  0 is 'do not inflate'
   inf_start_from_restart   = .false.,                 .false.
   inf_output_restart       = .false.,                 .false.
   inf_deterministic        = .true.,                  .true.
   inf_in_file_name         = 'not_initialized',       'not_initialized'
   inf_out_file_name        = 'not_initialized',       'not_initialized'
   inf_diag_file_name       = 'not_initialized',       'not_initialized'
   inf_initial              = 1.0,                     1.0
   inf_sd_initial           = 0.6,                     0.0
   inf_damping              = 0.9,                     0.0
   inf_lower_bound          = 1.0,                     1.0
   inf_upper_bound          = 1000000.0,               1000000.0
   inf_sd_lower_bound       = 0.6,                     0.0
  /

&ensemble_manager_nml
   single_restart_file_in  = .false.       .false. means each enemble member is in a separate file
   single_restart_file_out = .false.
   perturbation_amplitude  = 0.2           not used if 'single_restart_file_in' is .false.
  /

&assim_tools_nml
   filter_kind             = 1             1 is EAKF, 2 is EnKF ...
   cutoff                  = 0.2           this is your localization - units depend on type of 'location_mod'
  /

&obs_kind_nml
   assimilate_these_obs_types = 'RAW_STATE_VARIABLE'    Again, use a list ...
  /

&model_nml
   assimilation_perior_days    = 0                      the assimilation interval is up to you
   assimilation_perior_seconds = 3600
  /


num_output_state_members are '.true.' so the state vector is output at every time for which there are observations (once a day here). Posterior_Diag.nc and Prior_Diag.nc then contain values for 20 ensemble members once a day. Once the namelist is set, execute filter to integrate the ensemble forward for 24,000 steps with the final ensemble state written to the filter_restart. Copy the perfect_model_obs restart file perfect_restart (the `true state') to perfect_ics, and the filter restart file filter_restart to filter_ics so that future assimilation experiments can be initialized from these spun-up states.

mpirun ./filter        -OR-

mpirun.lsf ./filter    -OR-

./filter               -OR-

however YOU run filter on your system!


4. ASSESS THE PERFORMANCE!


All the concepts of spread, rmse, rank histograms that were taught in the DART tutorial and in DART_LAB should be applied now. Try the techniques described in the Did my experiment work? section. The 'big three' state-space diagnostics are repeated here because they are so important. The first two require the True_State.nc .


plot_bins.m plots the rank histograms for a set of state variables. This requires you to have all or most of the ensemble members available in the Prior_Diag.nc or Posterior_Diag.nc files.
plot_total_err.m plots the evolution of the error (un-normalized) and ensemble spread of all state variables.
plot_ens_mean_time_series.m    plots the evolution of a set of state variables - just the ensemble mean (and Truth, if available). plot_ens_time_series.m is actually a better choice if you can afford to write all/most of the ensemble members to the Prior_Diag.nc and Posterior_Diag.nc files.

DON'T FORGET ABOUT THE OBSERVATION-SPACE DIAGNOSTICS!


[top]


Please suggest ways for us to improve DART.