PROGRAM obs_diag (for 1D observations)

DART project logo

Jump to DART Documentation Main Index
version information for this file:
$Id: obs_diag.html 10820 2016-12-20 22:48:50Z $



Main program for observation-space diagnostics for the models with 1D locations. 18 quantities are calculated for each region for each temporal bin specified by user input. The result of the code is a netCDF file that contains the 18 quantities of the prior (aka 'guess') and posterior (aka 'analysis') estimates as a function of time and region as well as all the metadata to create meaningful figures.

Each file contains an observation sequence that has multiple 'copies' of the observation. One copy is the actual observation, another copy is the prior ensemble mean estimate of the observation, one is the spread of the prior ensemble estimate, one may be the prior estimate from ensemble member 1, ... etc. The only observations for the 1D models are the result of a 'perfect model' experiment, so there is an additional copy called the 'truth' - the noise-free expected observation given the true model state. Since this copy does not, in general, exist for the high-order models, all comparisons are made with the copy labelled 'observation'. It may, in some instances be useful to compare against the 'truth', in which case you will have to hand-edit the code and recompile. Caveat emptor.

Each ensemble member applies a forward operator to the state to compute the "expected" value of an observation. Given multiple estimates of the observation, several quantities can be calculated. It is possible to compute the expected observations from the state vector before assimilating (the "guess", "forecast", or "prior") or after the assimilation (the "analysis", or "posterior").

Even with input.nml:filter_nml:num_output_obs_members set to 0; the full [prior,posterior] ensemble mean and [prior,posterior] ensemble spread are preserved in the file. Consequently, the ensemble means and spreads are used to calculate the diagnostics. If the input.nml:filter_nml:num_output_obs_members is set to 80 (for example); the first 80 ensemble members prior and posterior "expected" values of the observation are also included. In this case, the file contains enough information to calculate a rank histograms, verify forecasts, etc. The ensemble means are still used for many other calculations.

obs_diag is designed to explore the effect of the assimilation in two ways; 1) as a function of time for a particular variable (this is the figure on the left), and sometimes 2) in terms of a rank histogram - "Where does the actual observation rank relative to the rest of the ensemble?" (figure on the right). The figures were created by Matlab® scripts that query the file: DART/diagnostics/matlab/plot_evolution.m and plot_rank_histogram.m. Both of these takes as input a file name and a 'quantity' to plot ('rmse', 'spread', 'totalspread', ...) and exhaustively plots the quantity (for every variable, every region) in a single matlab figure window - and creates a series of .ps files with multiple pages for each of the figures. The directory gets cluttered with them.

The observation sequence files contain only the time of the observation, nothing of the assimilation interval, etc. - so it requires user guidance to declare what sort of temporal binning for the temporal evolution plots. I do a 'bunch' of arithmetic on the namelist times to convert them to a series of temporal bin edges that are used when traversing the observation sequence. The actual algorithm is that the user input for the start date and bin width set up a sequence that ends in one of two ways ... the last time is reached or the number of bins has been reached. NOTE: for the purpose of interpretability, the 1D obs_diag routines saves 'dates' as GREGORIAN dates despite the fact these systems have no concept of a calendar.

obs_diag reads files and calculates the following quantities (in no particular order) for an arbitrary number of regions and levels. It is necessary to query the CopyMetaData variable to determine the storage order (i.e. "which copy is what?").

Nposs The number of observations available to be assimilated.
Nused The number of observations that were assimilated.
rmse The root-mean-squared error (the horizontal wind components are also used to calculate the vector wind velocity and its RMS error).
bias The simple sum of forecast - observation. The bias of the horizontal wind speed (not velocity) is also computed.
spread The standard deviation of the univariate obs. DART does not exploit the bivariate nature of U,V winds and so the spread of the horizontal wind is defined as the sum of the spreads of the U and V components.
totalspread    The total standard deviation of the estimate. We pool the ensemble variance of the observation plus the observation error variance and take the square root.
NbadDARTQC    the number of observations that had a DART QC value (> 1 for a prior, > 3 for a posterior)
observation the mean of the observation values
ens_mean the ensemble mean of the model estimates of the observation values
N_DARTqc_0 the number of observations that had a DART QC value of 0
N_DARTqc_1 the number of observations that had a DART QC value of 1
N_DARTqc_2 the number of observations that had a DART QC value of 2
N_DARTqc_3 the number of observations that had a DART QC value of 3
N_DARTqc_4 the number of observations that had a DART QC value of 4
N_DARTqc_5 the number of observations that had a DART QC value of 5
N_DARTqc_6 the number of observations that had a DART QC value of 6
N_DARTqc_7 the number of observations that had a DART QC value of 7
N_trusted the number of 'trusted' observations, regardless of DART QC

Here is a table explaining the DART QC values:

DART QC flag valuemeaning
0observation assimilated
1observation evaluated only
DART QC values higher than this means the prior and posterior are OK, but ...
2assimilated, but the posterior forward operator failed
3Evaluated only, but the posterior forward operator failed
DART QC values higher than this means only the prior is OK, but ...
4prior forward operator failed
5not used because of namelist control
DART QC values higher than this are bad news.
6prior QC rejected
7outlier rejected
8+reserved for future use


What is new in the Lanai Release.

obs_diag has several improvements:

  1. Support for 'trusted' observations. Trusted observation types may be specified in the namelist and all observations of that type will be counted in the statistics despite the DART QC code (as long as the forward observation operator succeeds). See namelist variable trusted_obs.

  2. Support for 'true' observations (i.e. from an OSSE). If the 'truth' copy of an observation is desired for comparison (instead of the default copy) the observation error variance is set to 0.0 and the statistics are calculated relative to the 'truth' copy (as opposed to the normal 'noisy' or 'observation' copy). See namelist variable use_zero_error_obs.

  3. discontinued the use of rat_cri and input_qc_threshold namelist variables. Their functionality was replaced by the DART QC mechanism long ago.

  4. The creation of the rank histogram (if possible) is now namelist-controlled by namelist variable create_rank_histogram.



This namelist is read from the file input.nml. Namelists start with an ampersand '&' and terminate with a slash '/'. Character strings that contain a '/' must be enclosed in quotes to prevent them from prematurely terminating the namelist.

   obs_sequence_name     = '',
   obs_sequence_list     = '',
   bin_width_days        = -1,
   bin_width_seconds     = -1,
   init_skip_days        = 0,
   init_skip_seconds     = 0,
   max_num_bins          = 9999,
   trusted_obs           = 'null',
   Nregions              = 3,
   lonlim1               = 0.00, 0.00, 0.50, -1.0,
   lonlim2               = 1.01, 0.50, 1.01, -1.0,
   reg_names             = 'whole', 'yin', 'yang', 'bogus',
   create_rank_histogram = .true.,
   outliers_in_histogram = .true.,
   use_zero_error_obs    = .false.,
   verbose               = .false.

The allowable ranges for the region boundaries are: lon [0.0, 1.0). The 1D locations are conceived as the distance around a unit sphere. An observation with a location exactly ON a region boundary cannot 'count' for both regions. The logic used to resolve this is:

if((lon ≥ lon1) .and. (lon < lon2)) keeper = .true.

Consequently, if you want to include an observation precisely AT 1.0, (for example), you need to specify something a little larger than 1.0.

You can only specify either obs_sequence_name or obs_sequence_list -- not both. One of them has to be an empty string ... i.e. ''.

Item Type Description
obs_sequence_name character Name of the observation sequence file(s).
This may be a relative or absolute filename. If the filename contains a '/', the filename is considered to be comprised of everything to the right, and a directory structure to the left. The directory structure is then queried to see if it can be incremented to handle a sequence of observation files. The default behavior of obs_diag is to look for additional files to include until the files are exhausted or an file is found that contains observations beyond the timeframe of interest.
e.g. 'obsdir_001/' will cause obs_diag to look for 'obsdir_002/', and so on.
If this is set, obs_sequence_list must be set to a null string (' ').
obs_sequence_list character Name of an ascii text file which contains a list of one or more observation sequence files, one per line. If this is specified, obs_sequence_name must be set to a null string (' '). Can be created by any method, including sending the output of the 'ls' command to a file, a text editor, or another program. If this is set, obs_sequence_name must be set to a null string (' ').
bin_width_days, bin_width_seconds integer Specifies the width of the analysis window. All observations within a window centered at the observation time +/- bin_width_[days,seconds] is used. If both values are 0, half the separation between observation times as defined in the observation sequence file is used for the bin width (i.e. all observations used).
init_skip_days, init_skip_seconds integer Ignore all observations before this time. This allows one to skip the 'spinup' or stabilization period of an assimilation.
max_num_bins integer This provides a way to restrict the number of temporal bins. If max_num_bins is set to '10', only 10 timesteps will be output, provided there are that many.
Nregions integer The number of regions for the unit circle for which you'd like observation-space diagnostics. If 4 is not enough increase MaxRegions in obs_diag.f90 and recompile.
lonlim1 real(r8) array of length(Nregions) starting value of coordinates defining 'regions'. A value of -1 indicates the start of 'no region'.
lonlim2 real(r8) array of length(Nregions) ending value of coordinates defining 'regions'. A value of -1 indicates the end of 'no region'.
reg_names character(len=6), dimension(Nregions) Array of names for each of the regions. The default example has the unit circle as a whole and divided into two equal parts, so there are only three regions.
create_rank_histogram logical Determines whether or not to create the rank histogram IFF the observation sequence file has individual ensemble members. Number of output observation values is set when running filter in the &filter_nml namelist item: &filter_nml:num_output_obs_members
outliers_in_histogram logical Determines whether or not to use observations that have been rejected by the outlier threshhold mechanism in the calculation of the rank histogram.
trusted_obs character(len=32), dimension(5) Array of names for observation TYPES that will be included in the statistics if at all possible (i.e. the forward observation operator succeeds). For 1D observations the only choices in the code as distributed are 'RAW_STATE_VARIABLE' and/or 'RAW_STATE_1D_INTEGRAL'. (Additional 1D observation types can be added by the user.)
use_zero_error_obs logical if .true., the observation copy used for the statistics calculations will be 'truth'. Only 'perfect' observations (from perfect_model_obs) have this copy. The observation error variance will be set to zero.
verbose logical switch controlling amount of run-time output.







Discussion of

Every observation type encountered in the observation sequence file is tracked separately, and aggregated into temporal and 3D spatial bins. There are two main efforts to this program. One is to track the temporal evolution of any of the quantities available in the netCDF file for any possible observation type:

ncdump -v CopyMetaData,ObservationTypes

The other is to explore the vertical profile of a particular observation kind. By default, each observation kind has a 'guess/prior' value and an 'analysis/posterior' value - which shed some insight into the innovations.

temporal evolution

The output file has all the metadata I could think of, as well as separate variables for every observation type in the observation sequence file. Furthermore, there is a separate variable for the 'guess/prior' and 'analysis/posterior' estimate of the observation. To distinguish between the two, a suffix is appended to the variable name. An example seems appropriate:

  char CopyMetaData(copy, stringlength) ;
          CopyMetaData:long_name = "quantity names" ;
  int rank_bins(rank_bins) ;
          rank_bins:long_name = "rank histogram bins" ;
          rank_bins:comment = "position of the observation among the sorted noisy ensemble members" ;
  float RAW_STATE_VARIABLE_guess(time, copy, region) ;
          RAW_STATE_VARIABLE_guess:_FillValue = -888888.f ;
          RAW_STATE_VARIABLE_guess:missing_value = -888888.f ;
  float RAW_STATE_VARIABLE_analy(time, copy, region) ;
          RAW_STATE_VARIABLE_analy:_FillValue = -888888.f ;
          RAW_STATE_VARIABLE_analy:missing_value = -888888.f ;

rank histograms

If it is possible to calculate a rank histogram, there will also be :

  int RAW_STATE_VARIABLE_guess_RankHist(time, rank_bins, region) ;

as well as some global attributes. The attributes reflect the namelist settings and can be used by plotting routines to provide additional annotation for the histogram.

                :DART_QCs_in_histogram = 0, 1, 2, 3, 7 ;
                :outliers_in_histogram = "TRUE" ;

Please note:
netCDF restricts variable names to 40 characters, so '_Rank_Hist' may be truncated.



  1. none


get_last_obs No "last" observation in sequence Must have an incomplete observation sequence file.
get_first_obs No Observations in sequence. Empty observation sequence file.


none at this time








Terms of Use

DART software - Copyright 2004 - 2013 UCAR.
This open source software is provided by UCAR, "as is",
without charge, subject to all terms of use at

Contact: Tim Hoar
Revision: $Revision: 10820 $
Source: $URL: $
Change Date: $Date: 2016-12-20 15:48:50 -0700 (Tue, 20 Dec 2016) $
Change history:  try "svn log" or "svn diff"