Once a model is compatible with the DART facility all of the
functionality of DART is available. This includes 'perfect model'
experiments (also called Observing System Simulation Experiments - OSSEs).
Essentially, the model is run forward from a known state and, at predefined times,
an observation forward operator is applied to the model state to harvest
synthetic observations. This model trajectory is known as the 'true state'.
The synthetic observations are then used in an
assimilation experiment. The assimilation performance can then be evaluated
precisely because the true state (of the model) is known. Since the same
forward operator is used to harvest the synthetic observations as well as
during the assimilation, the 'representativeness' error of the assimilation
system is not an issue.
There are a set of MATLAB® functions to help explore the assimilation
performance in state-space as well as in observation-space.
An OSSE is explored in depth in our
Lorenz '96 example.
There are four fundamental steps to running an OSSE from within DART:
There are lots of ways to define an observation sequence that DART can use as
input for a perfect model experiment. If you have observations in DART format
already, you can simply use them. If you have observations in one of the
formats already supported by the DART converters (check
DART/observations/observations.html),
convert it to a DART observation sequence. You may need to use the
obs_sequence_tool to combine multiple observation sequence files into observation
sequence files for the perfect model experiment.
Any existing observation values and quality control information will be ignored by
perfect_model_obs - only the time and location information are used.
In fact, any and all existing observation and QC values will be removed.
GENERAL COMMENT ABOUT THE INTERPLAY BETWEEN THE MODEL STOP/START FREQUENCY AND THE
IMPACT ON THE OBSERVATION FREQUENCY: There is usually a very real difference between
the dynamical timestep of the model and when it is safe to stop and restart the model.
The assimilation window is (usually) required to be a multiple of the safe stop/start
frequency. For example, an atmospheric model may have a dynamical timestep of a few
seconds, but may be constrained such that it is only possible to stop/restart every
hour. In this case, the assimilation window is a multiple of 3600 seconds. Trying to get
observations at a finer timescale is not possible, we only have access to the model
state when the model stops.
If you do not have an input observation sequence, it is simple to create one.
Both create_obs_sequence and create_fixed_network_seq interactively prompt you for the information they require. This can be quite tedious if you want a spatially dense set of observations. People have been known to actually write programs to generate the input to create_obs_sequence and simply pipe or redirect the information into the program. There are several examples of these in the models/bgrid_solo directory: column_rand.f90, id_set_def_stdin.f90, ps_id_stdin.f90, and ps_rand_local.f90 . Be advised that some observation types have different input requirements, so a 'one size fits all' program is a waste of time.
NOTE: only the observation kinds in the input.nml &obs_kind_nml:assimilate_these_obs_types,evaluate_these_obs will be available to the create_obs_sequence program.
DEVELOPERS TIP: You can specify 'identity' observations as input to perfect_model_obs. Identity observations are the model values AT the exact gridcell location, there is no interpolation at all. Just a straight table-lookup. This can be useful as you develop your model interfaces; you can test many of the routines and scripts without having a working model_interpolate().
More information about creating observation sequence files for OSSE's is available in the observation sequence discussion section.
perfect_model_obs reads the blueprint and an initial state and applies the appropriate forward observation operator for each and every observation in the current 'assimilation window'. If necessary, the model is advanced until the next set of observations is desired. When it has run out of observations or reached the stop time defined by the namelist control, the program stops and writes out a restart file, a diagnostic file, the observation sequence file, and a log file. This is fundamentally a single deterministic forecast for 'as long as it takes' to harvest all the observations.
default filename | format | contents |
---|---|---|
perfect_restart | ASCII or binary | The DART model state at the end of the forecast. If the forecast needs to be lengthened, use this as the input. The format of the file is controlled by input.nml &assim_model_nml:write_binary_restart_files The first record is the valid time of the model. The rest is the model state at that time. |
True_State.nc | netCDF | The DART model state at every assimilation timestep.
This file has but one 'copy' - the truth. Dump the copy metadata
and the time: ncdump -v time,CopyMetaData True_State.nc |
obs_seq.out | ASCII or binary DART-specific linked list |
This file has the observations - the result of the forward observation operator. This observation sequence file has two 'copies' of the observation: the noisy 'copy' and the noise-free 'copy'. The noisy copy is designated as the 'observation', the noise-free copy is the truth. The observation-space diagnostic program obs_diag has special options for using the true copy instead of the observation copy. See the obs_diag.html for details. |
dart_log.out | ASCII | The run-time output of perfect_model_obs . |
Each model may define the assimilation window differently, but conceptually, all the observations plus or minus half the assimilation window are considered to be simultaneous and a single model state provides the basis for all those observations. For example: if the blueprint requires temperature observations every 30 seconds, the initial model time is noon (12:00) and the assimilation window is 1 hour; all the observations from 11:30 to 12:30 will use the same state as input for the forward observation operator. The fact that you have a blueprint for observations every 30 seconds means a lot of those observations may have the same value (if they are in the same location).
perfect_model_obs uses the input.nml for its control. A subset of the namelists and variables of particular interest for perfect_model_obs are summarized here. Each namelist is fully described by the corresponding module document.
&perfect_model_obs_nml <--- link to the full namelist description! ... start_from_restart = .true. usually, but not always output_restart = .true. sure, why not init_time_days = -1 negative means use the time in ... init_time_seconds = -1 the 'restart_in_file_name' file first_obs_days = -1 negative means start at the first time in ... first_obs_seconds = -1 the 'obs_seq_in_file_name' file. last_obs_days = -1 negative means to stop with the last ... last_obs_seconds = -1 observation in the file. restart_in_file_name = "perfect_ics" restart_out_file_name = "perfect_restart" obs_seq_in_file_name = "obs_seq.in" obs_seq_out_file_name = "obs_seq.out" output_interval = 1 async = 0 totally depends on the model adv_ens_command = "./advance_ens.csh" depends on the model / &obs_sequence_nml write_binary_obs_sequence = .false. .false. will create ASCII - easy to check. / &obs_kind_nml ... assimilate_these_obs_types = 'RADIOSONDE_TEMPERATURE', ... list all the synthetic observation ... types you want / &assim_model_nml ... write_binary_restart_files = .true. your choice / &model_nml ... time_step_days = 0, some models call this 'assimilation_period_days' time_step_seconds = 3600 some models call this 'assimilation_period_seconds' use whatever value you want / &utilities_nml ... termlevel = 1 your choice logfilename = 'dart_log.out' your choice /
Since perfect_model_obs generally requires advancing the model, and the model may use MPI or require special ancillary files or forcing files or ..., it is not possible to provide a single example that will cover all possibilities. The subroutine-callable models (i.e. the low-order models) can run perfect_model_obs very simply:
This step is done with the program
filter,
which also uses input.nml for input and run-time control.
A successful assimilation will depend on many things: an approprite initial ensemble,
monitoring and perhaps correcting the ensemble spread, localization, etc. It is
simply not possible to design a one-size-fits-all system that will work for all cases.
It is critically important to analyze the results of the assimilation
and explore ways of making the assimilation more effective.
The DART tutorial and
the DART_LAB exercises
are an invaluable resource to learn and understand how to determine the effectiveness of,
and improve upon, an assimilation experiment. The concepts learned with the low-order models
are directly applicable to the most complicated models.
It is important to remember that if filter
'terminates normally', it does not necessarily mean the assimilation was effective!
filter
produces two state-space output diagnostic files
(Prior_Diag.nc and Posterior_Diag.nc)
which contains values of the ensemble mean, ensemble spread, perhaps the
inflation values, and (optionally) ensemble members for the duration of
the experiment. filter also creates an observation
sequence file that contains the input observation information as well as the
prior and posterior ensemble mean estimates of that observation,
the prior and posterior ensemble spread for that observation,
and (optionally), the actual prior and posterior ensemble estimates of that observation.
Rather than replicate the observation metadata for each of these, the single
metadata is shared for all these 'copies' of the observation. See
An overview of the observation sequence
for more detail. filter also produces a run-time
log file that can greatly aid in determining what went wrong if the program
terminates abnormally.
A very short description of some of the most important namelist variables
is presented here. Basically, I am only discussing the settings necessary to
get filter to run. I can guarantee these settings WILL NOT generate the BEST
assimilation. Again, see the module documentation for a full description of
each namelist.
&filter_nml <--- link to the full namelist description! async = 0 adv_ens_command = "./advance_model.csh" ens_size = 40 something ≥ 20, please start_from_restart = .false. .false. requires reading available input files output_restart = .true. obs_sequence_in_name = "obs_seq.out" whatever you called the output from perfect_model_obs obs_sequence_out_name = "obs_seq.final" restart_in_file_name = "filter_ics" the file (or base file name) of your ensemble restart_out_file_name = "filter_restart" init_time_days = -1 the time in the restart file is correct init_time_seconds = -1 first_obs_days = -1 same interpretation as with perfect_model_obs first_obs_seconds = -1 last_obs_days = -1 same interpretation as with perfect_model_obs last_obs_seconds = -1 num_output_state_members = 10 # of FULL DART model states to put in state-space output files num_output_obs_members = 40 # of ensemble member 'copies' of observation to save output_interval = 1 num_groups = 1 input_qc_threshold = 4.0 outlier_threshold = 3.0 Observation rejection criterion! output_forward_op_errors = .false. output_timestamps = .false. output_inflation = .true. inf_flavor = 0, 0 0 is 'do not inflate' inf_start_from_restart = .false., .false. inf_output_restart = .false., .false. inf_deterministic = .true., .true. inf_in_file_name = 'not_initialized', 'not_initialized' inf_out_file_name = 'not_initialized', 'not_initialized' inf_diag_file_name = 'not_initialized', 'not_initialized' inf_initial = 1.0, 1.0 inf_sd_initial = 0.6, 0.0 inf_damping = 0.9, 0.0 inf_lower_bound = 1.0, 1.0 inf_upper_bound = 1000000.0, 1000000.0 inf_sd_lower_bound = 0.6, 0.0 / &ensemble_manager_nml single_restart_file_in = .false. .false. means each enemble member is in a separate file single_restart_file_out = .false. perturbation_amplitude = 0.2 not used if 'single_restart_file_in' is .false. / &assim_tools_nml filter_kind = 1 1 is EAKF, 2 is EnKF ... cutoff = 0.2 this is your localization - units depend on type of 'location_mod' / &obs_kind_nml assimilate_these_obs_types = 'RAW_STATE_VARIABLE' Again, use a list ... / &model_nml assimilation_perior_days = 0 the assimilation interval is up to you assimilation_perior_seconds = 3600 /
num_output_state_members are '.true.' so
the state vector is output at every time for which there are
observations (once a day here).
Posterior_Diag.nc and Prior_Diag.nc
then contain values for 20 ensemble members once a day.
Once the namelist is set, execute filter to
integrate the ensemble forward for 24,000 steps with the final ensemble
state written to the filter_restart.
Copy the perfect_model_obs restart file
perfect_restart (the `true state') to
perfect_ics, and the
filter restart file
filter_restart to
filter_ics so that future assimilation experiments
can be initialized from these spun-up states.
mpirun ./filter -OR- mpirun.lsf ./filter -OR- ./filter -OR- however YOU run filter on your system!
All the concepts of spread, rmse, rank histograms that were taught in the DART tutorial and in DART_LAB should be applied now. Try the techniques described in the Did my experiment work? section. The 'big three' state-space diagnostics are repeated here because they are so important. The first two require the True_State.nc .
plot_bins.m | plots the rank histograms for a set of state variables. This requires you to have all or most of the ensemble members available in the Prior_Diag.nc or Posterior_Diag.nc files. |
plot_total_err.m | plots the evolution of the error (un-normalized) and ensemble spread of all state variables. |
plot_ens_mean_time_series.m | plots the evolution of a set of state variables - just the ensemble mean (and Truth, if available). plot_ens_time_series.m is actually a better choice if you can afford to write all/most of the ensemble members to the Prior_Diag.nc and Posterior_Diag.nc files. |