Jump to DART Documentation Main Index |
Main program for observation-space diagnostics for the models with 1D locations. 18 quantities are calculated for each region for each temporal bin specified by user input. The result of the code is a netCDF file that contains the 18 quantities of the prior (aka 'guess') and posterior (aka 'analysis') estimates as a function of time and region as well as all the metadata to create meaningful figures.
Each obs_seq.final file contains an observation sequence that has multiple 'copies' of the observation. One copy is the actual observation, another copy is the prior ensemble mean estimate of the observation, one is the spread of the prior ensemble estimate, one may be the prior estimate from ensemble member 1, ... etc. The only observations for the 1D models are the result of a 'perfect model' experiment, so there is an additional copy called the 'truth' - the noise-free expected observation given the true model state. Since this copy does not, in general, exist for the high-order models, all comparisons are made with the copy labelled 'observation'. It may, in some instances be useful to compare against the 'truth', in which case you will have to hand-edit the code and recompile. Caveat emptor.
Each ensemble member applies a forward operator to the state to compute the "expected" value of an observation. Given multiple estimates of the observation, several quantities can be calculated. It is possible to compute the expected observations from the state vector before assimilating (the "guess", "forecast", or "prior") or after the assimilation (the "analysis", or "posterior").
Even with input.nml:filter_nml:num_output_obs_members set to 0; the full [prior,posterior] ensemble mean and [prior,posterior] ensemble spread are preserved in the obs_seq.final file. Consequently, the ensemble means and spreads are used to calculate the diagnostics. If the input.nml:filter_nml:num_output_obs_members is set to 80 (for example); the first 80 ensemble members prior and posterior "expected" values of the observation are also included. In this case, the obs_seq.final file contains enough information to calculate a rank histograms, verify forecasts, etc. The ensemble means are still used for many other calculations.
obs_diag is designed to explore the effect of the assimilation in two ways; 1) as a function of time for a particular variable (this is the figure on the left), and sometimes 2) in terms of a rank histogram - "Where does the actual observation rank relative to the rest of the ensemble?" (figure on the right). The figures were created by Matlab® scripts that query the obs_diag_output.nc file: DART/diagnostics/matlab/plot_evolution.m and plot_rank_histogram.m. Both of these takes as input a file name and a 'quantity' to plot ('rmse', 'spread', 'totalspread', ...) and exhaustively plots the quantity (for every variable, every region) in a single matlab figure window - and creates a series of .ps files with multiple pages for each of the figures. The directory gets cluttered with them.
The observation sequence files contain only the time of the observation, nothing of the assimilation interval, etc. - so it requires user guidance to declare what sort of temporal binning for the temporal evolution plots. I do a 'bunch' of arithmetic on the namelist times to convert them to a series of temporal bin edges that are used when traversing the observation sequence. The actual algorithm is that the user input for the start date and bin width set up a sequence that ends in one of two ways ... the last time is reached or the number of bins has been reached. NOTE: for the purpose of interpretability, the 1D obs_diag routines saves 'dates' as GREGORIAN dates despite the fact these systems have no concept of a calendar.
obs_diag reads obs_seq.final files and calculates the following quantities (in no particular order) for an arbitrary number of regions and levels. It is necessary to query the CopyMetaData variable to determine the storage order (i.e. "which copy is what?").
Nposs | The number of observations available to be assimilated. |
Nused | The number of observations that were assimilated. |
rmse | The root-mean-squared error (the horizontal wind components are also used to calculate the vector wind velocity and its RMS error). |
bias | The simple sum of forecast - observation. The bias of the horizontal wind speed (not velocity) is also computed. |
spread | The standard deviation of the univariate obs. DART does not exploit the bivariate nature of U,V winds and so the spread of the horizontal wind is defined as the sum of the spreads of the U and V components. |
totalspread | The total standard deviation of the estimate. We pool the ensemble variance of the observation plus the observation error variance and take the square root. |
NbadDARTQC | the number of observations that had a DART QC value (> 1 for a prior, > 3 for a posterior) |
observation | the mean of the observation values |
ens_mean | the ensemble mean of the model estimates of the observation values |
N_DARTqc_0 | the number of observations that had a DART QC value of 0 |
N_DARTqc_1 | the number of observations that had a DART QC value of 1 |
N_DARTqc_2 | the number of observations that had a DART QC value of 2 |
N_DARTqc_3 | the number of observations that had a DART QC value of 3 |
N_DARTqc_4 | the number of observations that had a DART QC value of 4 |
N_DARTqc_5 | the number of observations that had a DART QC value of 5 |
N_DARTqc_6 | the number of observations that had a DART QC value of 6 |
N_DARTqc_7 | the number of observations that had a DART QC value of 7 |
N_trusted | the number of 'trusted' observations, regardless of DART QC |
Here is a table explaining the DART QC values:
DART QC flag value | meaning |
---|---|
0 | observation assimilated |
1 | observation evaluated only |
DART QC values higher than this means the prior and posterior are OK, but ... | |
2 | assimilated, but the posterior forward operator failed |
3 | Evaluated only, but the posterior forward operator failed |
DART QC values higher than this means only the prior is OK, but ... | |
4 | prior forward operator failed |
5 | not used because of namelist control |
DART QC values higher than this are bad news. | |
6 | prior QC rejected |
7 | outlier rejected |
8+ | reserved for future use |
obs_diag has several improvements:
This namelist is read from the file input.nml. Namelists start with an ampersand '&' and terminate with a slash '/'. Character strings that contain a '/' must be enclosed in quotes to prevent them from prematurely terminating the namelist.
&obs_diag_nml obs_sequence_name = 'obs_seq.final' obs_sequence_list = '' bin_width_days = -1 bin_width_seconds = -1 init_skip_days = 0 init_skip_seconds = 0 max_num_bins = 9999 trusted_obs = 'null' Nregions = 3 lonlim1 = 0.0, 0.0, 0.5 lonlim2 = 1.0, 0.5, 1.0 reg_names = 'whole', 'yin', 'yang' create_rank_histogram = .true. outliers_in_histogram = .true. use_zero_error_obs = .false. verbose = .false. /
The allowable ranges for the region boundaries are: lon [0.0, 1.0). The 1D locations are conceived as the distance around a unit sphere. An observation with a location exactly ON a region boundary cannot 'count' for both regions. The logic used to resolve this is:
if((lon ≥ lon1) .and. (lon < lon2)) keeper = .true.
Consequently, if you want to include an observation precisely AT 1.0, (for example),
you need to specify something a little larger than 1.0.
You can only specify either
obs_sequence_name
or
obs_sequence_list -- not both.
One of them has to be an empty string ... i.e. ''.
Item | Type | Description |
---|---|---|
obs_sequence_name | character | Name of the observation sequence file(s). This may be a relative or absolute filename. If the filename contains a '/', the filename is considered to be comprised of everything to the right, and a directory structure to the left. The directory structure is then queried to see if it can be incremented to handle a sequence of observation files. The default behavior of obs_diag is to look for additional files to include until the files are exhausted or an obs_seq.final file is found that contains observations beyond the timeframe of interest. e.g. 'obsdir_001/obs_seq.final' will cause obs_diag to look for 'obsdir_002/obs_seq.final', and so on. If this is set, obs_sequence_list must be set to a null string (' '). |
obs_sequence_list | character | Name of an ascii text file which contains a list of one or more observation sequence files, one per line. If this is specified, obs_sequence_name must be set to a null string (' '). Can be created by any method, including sending the output of the 'ls' command to a file, a text editor, or another program. If this is set, obs_sequence_name must be set to a null string (' '). |
bin_width_days, bin_width_seconds | integer | Specifies the width of the analysis window. All observations within a window centered at the observation time +/- bin_width_[days,seconds] is used. If both values are 0, half the separation between observation times as defined in the observation sequence file is used for the bin width (i.e. all observations used). |
init_skip_days, init_skip_seconds | integer | Ignore all observations before this time. This allows one to skip the 'spinup' or stabilization period of an assimilation. |
max_num_bins | integer | This provides a way to restrict the number of temporal bins. If max_num_bins is set to '10', only 10 timesteps will be output, provided there are that many. |
Nregions | integer | The number of regions for the unit circle for which you'd like observation-space diagnostics. If 3 is not enough increase MaxRegions in obs_diag.f90 and recompile. |
lonlim1 | real(r8) array of length(Nregions) | starting value of coordinates defining 'regions'. A value of -1 indicates the start of 'no region'. |
lonlim2 | real(r8) array of length(Nregions) | ending value of coordinates defining 'regions'. A value of -1 indicates the end of 'no region'. |
reg_names | character(len=6), dimension(Nregions) | Array of names for each of the regions. The default example has the unit circle as a whole and divided into two equal parts, so there are only three regions. |
create_rank_histogram | logical | Determines whether or not to create the rank histogram IFF the observation sequence file has individual ensemble members. Number of output observation values is set when running filter in the &filter_nml namelist item: &filter_nml:num_output_obs_members |
outliers_in_histogram | logical | Determines whether or not to use observations that have been rejected by the outlier threshhold mechanism in the calculation of the rank histogram. |
trusted_obs | character(len=32), dimension(5) | Array of names for observation TYPES that will be included in the statistics if at all possible (i.e. the forward observation operator succeeds). For 1D observations the only choices in the code as distributed are 'RAW_STATE_VARIABLE' and/or 'RAW_STATE_1D_INTEGRAL'. (Additional 1D observation types can be added by the user.) |
use_zero_error_obs | logical | if .true., the observation copy used for the statistics calculations will be 'truth'. Only 'perfect' observations (from perfect_model_obs) have this copy. The observation error variance will be set to zero. |
verbose | logical | switch controlling amount of run-time output. |
types_mod obs_sequence_mod obs_def_mod obs_kind_mod location_mod time_manager_mod utilities_mod sort_mod random_seq_mod
assim_model_mod cov_cutoff_mod model_mod null_mpi_utilities_mod
Every observation type encountered in the observation sequence file is tracked separately, and aggregated into temporal and 3D spatial bins. There are two main efforts to this program. One is to track the temporal evolution of any of the quantities available in the netCDF file for any possible observation type:
The other is to explore the vertical profile of a particular observation kind. By default, each observation kind has a 'guess/prior' value and an 'analysis/posterior' value - which shed some insight into the innovations.
The obs_diag_output.nc output file has all the metadata I could think of, as well as separate variables for every observation type in the observation sequence file. Furthermore, there is a separate variable for the 'guess/prior' and 'analysis/posterior' estimate of the observation. To distinguish between the two, a suffix is appended to the variable name. An example seems appropriate:
... char CopyMetaData(copy, stringlength) ; CopyMetaData:long_name = "quantity names" ; ... int rank_bins(rank_bins) ; rank_bins:long_name = "rank histogram bins" ; rank_bins:comment = "position of the observation among the sorted noisy ensemble members" ; float RAW_STATE_VARIABLE_guess(time, copy, region) ; RAW_STATE_VARIABLE_guess:_FillValue = -888888.f ; RAW_STATE_VARIABLE_guess:missing_value = -888888.f ; float RAW_STATE_VARIABLE_analy(time, copy, region) ; RAW_STATE_VARIABLE_analy:_FillValue = -888888.f ; RAW_STATE_VARIABLE_analy:missing_value = -888888.f ; ...
If it is possible to calculate a rank histogram, there will also be :
... int RAW_STATE_VARIABLE_guess_RankHist(time, rank_bins, region) ; ...
as well as some global attributes. The attributes reflect the namelist settings and can be used by plotting routines to provide additional annotation for the histogram.
:DART_QCs_in_histogram = 0, 1, 2, 3, 7 ; :outliers_in_histogram = "TRUE" ;
Please note:
netCDF restricts variable names to 40 characters, so '_Rank_Hist'
may be truncated.
Routine | Message | Comment |
---|---|---|
get_last_obs | No "last" observation in sequence | Must have an incomplete observation sequence file. |
get_first_obs | No Observations in sequence. | Empty observation sequence file. |
none at this time
none.
N/A
DART software - Copyright UCAR. This open source software is provided by UCAR, "as is", without charge, subject to all terms of use at http://www.image.ucar.edu/DAReS/DART/DART_download
Contact: | Tim Hoar |
Revision: | $Revision: 11771 $ |
Source: | $URL: https://svn-dares-dart.cgd.ucar.edu/DART/releases/Manhattan/assimilation_code/programs/obs_diag/oned/obs_diag.html $ |
Change Date: | $Date: 2017-06-27 10:19:58 -0600 (Tue, 27 Jun 2017) $ |
Change history: | try "svn log" or "svn diff" |