PROGRAM filter

DART project logo

Jump to DART Documentation Main Index
version information for this file:
$Id: filter.html 6341 2013-07-31 14:24:51Z nancy $

NAMELIST / MODULES / FILES / REFERENCES / ERRORS / PLANS / TERMS OF USE

Overview

Main program for driving ensemble filter assimilations.

filter is a Fortran 90 program, and provides a large number of options for controlling execution behavior and parameter configuration that are driven from its namelist. See the namelist section below for more details. The number of assimilation steps to be done is controlled by the input observation sequence and by the time-stepping capabilities of the model being used in the assimilation.

This overview includes these subsections:

See the DART web site for more documentation, including a discussion of the capabilities of the assimilation system, a diagram of the entire execution cycle, the options and features.

Overview of Program Flow

The basic execution loop is:

The time of the observations in the input observation sequence file controls the length of execution of filter.

The same source code is used for all applications of filter. The code specific to the types of observations and the interface code for the computational model is configured at compile time. The DART directory structure is arranged slightly differently than usual in that the main code is spread across a dozen directories at the top level, e.g. the filter source code is in DART/filter/filter.f90. Each model has a separate directory under DART/models, and under each model is a work directory where the code is compiled and can be run for testing. Generally when a full-size experiment is done the executables are copied to a different location - e.g. scratch space on a large filesystem - since the data files for 10s to 100s of copies of a model can get very large.

Types of Filters available

The different types of assimilation algorithms (EAKF, ENKF, Kernel filter, Particle filter, etc.) are determined by the &assim_tools_nml:filter_kind entry, described in assim_tools_mod.html. Despite having 'filter' in the name, they are assimilation algorithms and so are implemented in assim_tools_mod.f90.

Getting Started

Running a successful assimilation takes careful diagnostic work and experiment iterations to find the best settings for your specific case. The basic Kalman filter can be coded in only a handful of lines; the hard work is making the right choices to compensate for sampling errors, model bias, observation error, lack of model divergence, variations in observation density in space and time, random correlations, etc. There are tools built into DART to deal with most of these problems but it takes careful work to apply them correctly.

If you are adding a new model or a new observation type, we suggest you assimilate exactly one observation, with no model advance, with inflation turned off, with a large cutoff, and with the outlier threshold off (see below for how to set these namelist items). Run an assimilation. Look at the obs_seq.final file to see what the forward operator computed. Use ncdiff to difference the Prior and Posterior Diag NetCDF files and look at the changes (the "innovations") in the various model fields. Is it in the right location for that observation? Does it have a reasonable value?

Then assimilate a group of observations and check the results carefully. Run the observation diagnostics and look at the total error and spread. Look carefully at the number of observations being assimilated compared to how many are available. Assimilations that are not working can give good looking statistics if they reject all but the few observations that happen to match the current state. The errors should grow as the model advances and then shrink when new observations are assimilated, so a time-plot of the RMSE should show a sawtooth pattern. The initial error might be large but it should decrease and then reach a roughly stable level. The spread should remain constant, at a value around the expected observation error level. If the spread is too small that is ok for the baseline case; several of the DART facilities described below are intended to compensate for ensemble members getting too close to each other. Once you believe you have a working assimilation case, this will be your baseline case. Then one by one enable or tune each of the items below, checking each time to see what is the effect on the results.

Suggestions for the most common namelist settings and features built into DART for running a successful assimilation include:

Free run/Forecast After Assimilation

Separate scripting can be done to support forecasts starting from the analyzed model states. After filter exits, the models can be run freely (with no assimilated data) further forward in time using one or more of the last updated model states from filter. Since all ensemble members are equally likely a member can be selected at random, or a member close to the mean can be chosen. See the closest_member_tool for one way to select a "close" member. The ensemble mean is available to be used, but since it is a combination of all the member states it may not have self-consistent features, so using a single member is usually preferred.

Evaluating Observations Without Assimilation

Filter can be used to evaluate the accuracy of a single model state based on a set of available observations. Convert the model data into a single DART state vector, and either copy or link it so there appear to be 2 separate ensemble members (which are identical). Set the filter namelist ensemble size to 2 by setting ens_size to 2 in the &filter_nml namelist. Turn off the outlier threshold and both Prior and Posterior inflation by setting outlier_threshold to -1, and both the inf_flavor values to 0 in the same &filter_nml namelist. Set all observation types to be 'evaluate-only' and have no types in the 'assimilate' list by listing all types in the evaluate_these_obs_types list in the &obs_kind_nml section of the namelist, and none in the assimilation list. Run filter as usual, including model advances if needed. Run observation diagnostics on the resulting obs_seq.final file to compute the difference between the observed values and the predicted values from this model state.

Verification/Comparison With and Without Assimilation

To compare results of an experiment with and without assimilating data, do one run assimilating the observations. Then do a second run where all the observation types are moved to the evaluate_these_obs_types list in the &obs_kind_nml section of the namelist. Also turn inflation off by setting both inf_flavor values to 0 in the &filter_nml namelist. The forward operators will still be called, but they will have no impact on the model state. Then the two sets of diagnostic state space netcdf files can be compared to evaluate the impact of assimilating the observations, and the observation diagnostic files can also be compared.

DART Quality Control Flag added to Output Observation Sequence File

The filter adds a quality control field with metadata 'DART quality control' to the obs_seq.final file. At present, this field can have the following values:

0: Observation is okay
1: Observation was evaluated only but not used in the assimilation
2: The observation was used but one or more of the posterior forward observation operators failed
3: The observation was evaluated only but not used AND one or more of the posterior forward observation operators failed
4: One or more prior forward observation operators failed so the observation was not used
5: The observation was not used because it was not selected in the namelist to be assimilated or evaluated
6: The prior quality control value was too high so the observation was not used.
7: Outlier test failed (see below)

The outlier test computes the difference between the observation value and the prior ensemble mean. It then computes a standard deviation by taking the square root of the sum of the observation error variance and the prior ensemble variance for the observation. If the difference between the ensemble mean and the observation value is more than the specified number of standard deviations, then the observation is not used and the DART quality control field is set to 7.

Discussion of Inflation Options

There are two choices for the basic type of inflation: observation space or state space. Almost all users use state space inflation and the rest of this discussion applies to this type. (If you are interested in observation space inflation, talk to Jeff first.)

State space inflation changes the spread of a set of ensemble members without changing the mean value. The algorithm computes the mean and standard deviation for each variable in the state vector in turn, and then moves the values away from the mean in such a way that the mean remains unchanged. The resulting standard deviation is (generally) larger than before. It can be applied to the Prior state, before observations are assimilated (the most frequently used case), or it can be applied to the Posterior state, after assimilation. See Anderson (2007), Anderson (2009).

Inflation can be a single value applied to all state space variables over all times. It can be a single value per state space variable, constant in time. And finally, it can vary with time, adapting to different densities of observations in time and space. To enable state space inflation, see the 'flavor' namelist options below. See the 'start_from_restart' options to set a single value verses a value per state space variable. To allow the values to adapt through time in each assimilation window see the 'sd_initial' description. There are additional options to damp inflation through time. In regions where the density of observations varies in time the damping slowly lowers the inflation values in the absence of new observations at those locations. In practice with large geophysical models using damped inflation has been a successful strategy. See the section describing 'inf_damping'.

The following namelist items related to inflation are all found in the input.nml file, in the &filter_nml namelist. The detailed descriptions are in the namelist section below. Here we try to give some basic advice about commonly used values and suggestions for where to start. In the namelist each entry has two values. The first is for Prior inflation and the second is for Posterior inflation. If 'flavor' is 0, all other settings for that column are ignored.

&filter_nml :: flavor
valid values: 0, 1, 2, 3
Set the type of Prior and Posterior inflation applied to the state vector. Values mean:
0: No inflation
1: Observation space inflation
2: Spatially-varying state space inflation
3: Spatially-fixed state space inflation
In practice we recommend starting with no inflation at all (both values 0), and then when first trying out inflation start with type 2 prior inflation and no inflation (0) for posterior.

&filter_nml :: inf_deterministic
valid values: .true. or .false.
Recommend always using .true..

&filter_nml :: inf_initial
valid values: real numbers, usually 1.0 or slightly larger
If not reading in inflation values from a restart file, the initial value to set for the inflation. Generally we recommend starting with just slightly above 1.0, maybe 1.02, for a slight amount of initial inflation.

&filter_nml :: inf_sd_initial
valid values: 0.0 to infinity, or -1 to disable
This namelist setting controls whether the inflation values evolve with time or not. A negative value prevents the inflation values from being updated, so they are constant throughout the run. If positive, the inflation values evolve through time. Even though we talk about a single inflation value, the inflation has a gaussian distribution with a mean and standard deviation. We use the mean value when we inflate, and the standard deviation indicates how sure of the value we are. Larger standard deviation values are less sure and the inflation value will vary more quickly with time. Smaller values are more sure and the time evolution will be slower since we are more confident that the mean is correct. We have had good results setting this and inf_sd_lower_bound to 0.6 for large geophysical models.

&filter_nml :: inf_lower_bound
valid values: real numbers, usually 1.0 or slightly larger
If inflation is time-evolving (see inf_sd_initial namelist item above), then this sets the lowest value the inflation can evolve to. Setting a number less than one allows for deflation but generally in a well-observed system the ensemble needs more spread and not less. We recommend a setting of 1.0.

&filter_nml :: inf_upper_bound
valid values: real numbers, usually 1.0 or slightly larger
If inflation is time-evolving (see inf_sd_initial namelist item above), then this sets the largest value the inflation can evolve to. We recommend a setting of 100.0, although if the inflation values reach those levels there is probably a problem with the assimilation.

&filter_nml :: inf_sd_lower_bound
valid values: 0.0 to infinity, or -1 to disable
If the setting of inf_sd_initial is -1.0 (to disable time evolution of inflation) then set this to the same value. Otherwise, set a lower value that the standard deviation of the inflation cannot fall below. As the width of the inflation distribution changes, this sets a lower bound for the value. Lower values will let the inflation vary more slowly with time; larger values will allow the inflation to adapt in time more quickly. We have had good results setting this and inf_sd_initial to 0.6 for large geophysical models.

&filter_nml :: inf_damping
valid values: 0.0 to 1.0
Applies if inflation is time-evolving. The difference between the current inflation value and 1.0 is multiplied by this factor before the next assimilation cycle. 0.0 turns all inflation off by clamping the inflation value to 1.0. 1.0 turns damping off by leaving the original inflation value unchanged. We have had good results in large geophysical models setting this to a value of 0.9, which damps slowly. Damping appears to particularly help in cases where there are dense clusters of observations at irregular times. Areas that are heavily observed evolve large inflation values to prevent the ensemble members from becoming too close to one another. However once the area is unobserved there was no mechanism to cause the inflation values to drop back down to smaller levels. The damping factor accomplishes this.

&filter_nml :: inf_output_restart
valid values: text string
The name of the file to write the inflation and standard deviation values into. This can be used to let spatially-varying inflation values evolve in a spinup phase, and then be read in and used as fixed values in further runs. Or if a long assimilation run is executed in separate jobs steps and time-varying inflation is used, then the restart file from the previous job step must be supplied as an input file for the next step. This filename sets where the output is going to be written. Note that there is only a single inflation value and a single standard deviation value per state vector variable will be written, so the total file size will be two times the state vector length (times the number of bytes in a real value).

The suggested procedure for testing inflation options is to start without any (both 'flavor' values set to 0). Then enable Prior state space, spatially-varying inflation, with no Posterior inflation (set 'flavor' to [2, 0]). Then try damped inflation (set 'inf_damping' to 0.9 and set 'inf_sd_initial' and 'inf_sd_lower_bound' to 0.6). The inflation values and standard deviation are written out to the Prior_Diag.nc and Posterior_Diag.nc files as the last 2 'copies', so the inflation fields can be plotted (we often use ncview ). Expected inflation values are generally in the 1 to 10 range; if values grow much larger than this it usually indicates a problem with the assimilation.

Detailed Program Execution Flow

The detailed execution flow inside the filter program is:



[top]

NAMELIST

This namelist is read from the file input.nml. Namelists start with an ampersand '&' and terminate with a slash '/'. Character strings that contain a '/' must be enclosed in quotes to prevent them from prematurely terminating the namelist.

&filter_nml
   async                    = 0,
   adv_ens_command          = "./advance_model.csh",
   ens_size                 = 20,
   start_from_restart       = .false.,
   output_restart           = .false.,
   obs_sequence_in_name     = "obs_seq.out",
   obs_sequence_out_name    = "obs_seq.final",
   restart_in_file_name     = "filter_ics",
   restart_out_file_name    = "filter_restart",
   init_time_days           = 0,
   init_time_seconds        = 0,
   first_obs_days           = -1,
   first_obs_seconds        = -1,
   last_obs_days            = -1,
   last_obs_seconds         = -1,
   num_output_state_members = 0,
   num_output_obs_members   = 0,
   output_interval          = 1,
   num_groups               = 1,
   input_qc_threshold       =  3.0,
   outlier_threshold        = -1.0,
   enable_special_outlier_code = .false.,
   output_forward_op_errors = .false.,
   output_restart_mean      = .false.,
   output_timestamps        = .false.,
   output_inflation         = .true.,
   trace_execution          = .false.,
   silence                  = .false.,

   inf_flavor                  = 0,                       0,
   inf_initial_from_restart    = .false.,                 .false.,
   inf_sd_initial_from_restart = .false.,                 .false.,
   inf_output_restart          = .false.,                 .false.,
   inf_deterministic           = .true.,                  .true.,
   inf_in_file_name            = 'not_initialized',       'not_initialized',
   inf_out_file_name           = 'not_initialized',       'not_initialized',
   inf_diag_file_name          = 'not_initialized',       'not_initialized',
   inf_initial                 = 1.0,                     1.0,
   inf_sd_initial              = 0.0,                     0.0,
   inf_damping                 = 1.0,                     1.0,
   inf_lower_bound             = 1.0,                     1.0,
   inf_upper_bound             = 1000000.0,               1000000.0,
   inf_sd_lower_bound          = 0.0,                     0.0
/


Particular options to be aware of are: async, ens_size, cutoff (localization radius), inflation flavor, outlier_threshold, restart filenames (including inflation), obs_sequence_in_name, horiz_dist_only, binary or ascii controls for state vector and observation sequence file formats. Some of these important items are located in other namelists, but all are in the same input.nml file.

The inflation control variables are all dimensioned 2, the first value being for the prior inflation and the second being for the posterior inflation.


Item Type Description
async integer Controls method for advancing model:
  • 0 is subroutine call
  • 2 is shell command
  • 4 is mpi-job script
adv_ens_command character(len=129) Command sent to shell if async is 2.
ens_size integer Size of ensemble.
start_from_restart logical True means start from a restart file, false means perturb a single state vector in restart file.
output_restart logical True means output a restart file.
obs_sequence_in_name character(len=129) File name from which to read an observation sequence.
obs_sequence_out_name character(len=129) File name to which to write output observation sequence.
restart_in_file_name character(len=129) File containing state restart vectors.
restart_out_file_name character(len=129) File to which to write restart state vectors.
init_time_days integer If negative, don't use. If non-negative, override the initial days read from state data restart file.
init_time_seconds integer If negative don't use. If non-negative, override the initial seconds read from state data restart file.
first_obs_days integer If negative, don't use. If non-negative, ignore all observations before this time.
first_obs_seconds integer If negative, don't use. If non-negative, ignore all observations before this time.
last_obs_days integer If negative, don't use. If non-negative, ignore all observations after this time.
last_obs_seconds integer If negative, don't use. If non-negative, ignore all observations after this time.
num_output_state_members integer Number of ensemble members to be included in the state diagnostic output.
num_output_obs_members integer Number of ensemble members to be included in the output observation sequence file.
output_interval integer Output state and observation diagnostics every 'N'th assimilation time, N is output_interval.
num_groups integer Number of groups for hierarchical filter.
outlier_threshold real(r8) Reject observation if prior mean is more than this many standard deviations from observation. Negative means no check.
enable_special_outlier_code logical If true call a subroutine which can be customized by the user to do more elaborate outlier thresholding tests. See failed_outlier() near the bottom of filter.f90 for where to add the custom code, and for commented out examples of possible tests. Turning this flag on and off allows comparisons to be made between the default outlier threshold code and any custom settings without having to recompile the code. To change the outlier behavior you will have to add code in DART/filter/filter.f90 and recompile your executables. As distributed, turning this flag on and off will make no difference in the results.
input_qc_threshold real(r8) Reject observation if incoming QC value exceeds this value. Incoming observations usually have a QC value provided with the dataset, e.g. NCEP obs include an incoming data QC.
output_forward_op_errors logical True means output errors from forward observation operators. This is the 'istatus' error return code from the model interpolate routine. An ascii text file prior_forward_op_errors and/or post_forward_op_errors will be created in the current directory. For each ensemble member which returns a non-zero return code, a line will be written to this file. Each line will have three values listed: the observation number, the ensemble member number, and the istatus return code. Be cautious when turning this option on. The number of lines in this file can be up to the number of observations times the number of ensemble members times the number of assimilation cycles performed. This option is generally most useful when run with a small observation sequence file and a small number of ensemble members to diagnose forward operator problems.
output_restart_mean logical True means output a restart file which contains the ensemble mean. The file name will be the value of the namelist item &filter_nml::restart_out_file_name with the string .mean appended. Even if &ensemble_manager_nml::single_restart_file_out is true the mean data will be written to a separate file.
output_timestamps logical True means write timing information to the log before and after the model advance and the observation assimilation phases.
output_inflation logical True means output inflation values in the prior and posterior diagnostic files. False omits them.
trace_execution logical True means output very detailed messages about what routines are being called in the main filter loop. Useful if a job hangs or otherwise doesn't execute as expected.
silence logical True means output almost no runtime messages. Not recommended for general use, but can speed long runs of the lower order models if the execution time becomes dominated by the volume of output.
All subsequent variables are arrays of length 2.
The first element is for the prior, the second element is for the posterior.
inf_flavor integer array dimension(2) Inflation flavor for [prior, posterior]
  • 0 = none
  • 1 = obs_space
  • 2 = spatially-varying state space
  • 3 = spatially-fixed state space
(See sd_initial below for how to set the time evolution options.)
inf_initial_from_restart logical array dimension(2) If true, get initial mean values for inflation from restart file. If false, use the corresponding namelist value inf_initial.
inf_sd_initial_from_restart logical array dimension(2) If true, get initial standard deviation values for inflation from restart file. If false, use the corresponding namelist value inf_sd_initial.
inf_deterministic logical array dimension(2) True means deterministic inflation, false means stochastic.
inf_output_restart logical array dimension(2) Output an inflation restart file if true.
inf_in_file_name character(len=129) dimension(2) Filename to read inflation restart values from.
inf_out_file_name character(len=129) dimension(2) Filename to write inflation restart values into.
inf_diag_file_name character(len=129) dimension(2) Filename to write output diagnostics for observation space inflation into.
inf_initial real(r8) dimension(2) Initial value of inflation if not read from restart file.
inf_sd_initial real(r8) dimension(2) Initial value of inflation standard deviation if not read from restart file. If negative, do not update the inflation values, so they are time-constant. If positive, the inflation values will adapt through time, so they are time-varying.
inf_damping real(r8) dimension(2) Damping factor for inflation mean values. The difference between the current inflation value and 1.0 is multiplied by this factor before the next assimilation cycle. The value should be between 0.0 and 1.0. Setting a value of 0.0 is full damping, which in fact turns all inflation off by fixing the inflation value at 1.0. A value of 1.0 turns inflation damping off leaving the original inflation value unchanged.
inf_lower_bound real(r8) dimension(2) Lower bound for inflation value.
inf_upper_bound real(r8) dimension(2) Upper bound for inflation value.
inf_sd_lower_bound real(r8) dimension(2) Lower bound for inflation standard deviation. If using a negative value for sd_initial this should also be negative to preserve the setting.


Deprecated or Obsolete namelist variables

The following table contains the deprecated or obsolete namelist variables. If only deprecated, the values will have no effect. If obsolete, it is a FATAL ERROR to have these in the namelist.

Contents Type Description
output_state_ens_mean obsolete The ensemble mean is now always in the state diagnostic output.
output_state_ens_spread obsolete The ensemble spread is now always in the state diagnostic output.
output_obs_ens_mean obsolete The ensemble mean is now always in the output observation sequence file.
output_obs_ens_spread obsolete The ensemble spread is now always in the output observation sequence file.
inf_start_from_restart obsolete The mean and standard deviation now have separate namelist controls.


[top]

MODULES USED

types_mod
obs_sequence_mod
obs_def_mod
time_manager_mod
utilities_mod
assim_model_mod
assim_tools_mod
obs_model_mod
ensemble_manager_mod
adaptive_inflate_mod
mpi_utilities_mod
smoother_mod


[top]

FILES



[top]

REFERENCES



[top]

ERROR CODES and CONDITIONS

RoutineMessageComment
filter_main ens_size in namelist is ###: Must be > 1 Ensemble size must be at least 2.
filter_main inf_flavor= ### Must be 0, 1, 2, 3. Only inflation options 0 to 3 are supported.
filter_main Posterior observation space inflation (type 1) not supported. Posterior observation space inflation doesn't work.
filter_main Number of processes > model size. Number of processes can't exceed model size for now.
filter_generate_copy_meta_data output metadata in filter needs state ensemble size < 10000, not ###. Only up to 10000 ensemble members with state output for now.
filter_generate_copy_meta_data output metadata in filter needs obs ensemble size < 10000, not ###. Only up to 10000 ensemble members with obs space output for now.
filter_setup_obs_sequence input obs_seq file has ### qc fields; must be < 2. Only 0 or 1 qc fields in input obs sequence for now.
get_obs_copy_index Did not find observation copy with metadata observation. Only 0 or 1 qc fields in input obs sequence for now.

KNOWN BUGS

none



[top]

FUTURE PLANS

Many. New assimilation algorithms, support for new observation types, support for additional models, better performance on higher numbers of MPI tasks... The list is long. Send email to dart@ucar.edu if you are interested in additional functionality in DART.



[top]

Terms of Use

DART software - Copyright 2004 - 2013 UCAR.
This open source software is provided by UCAR, "as is",
without charge, subject to all terms of use at
http://www.image.ucar.edu/DAReS/DART/DART_download

Contact: DART core group
Revision: $Revision: 6341 $
Source: $URL: https://svn-dares-dart.cgd.ucar.edu/DART/releases/Lanai/filter/filter.html $
Change Date: $Date: 2013-07-31 08:24:51 -0600 (Wed, 31 Jul 2013) $
Change History: try "svn log" or "svn diff"