Cycling Models


Overview

For simple models which can be advanced by a subroutine call, the filter program is driven by the input observation sequence. It assimilates all observations in the current assimilation window and then advances the model state until the window includes the next available observation. When it runs out of observations, filter exits.

For complex models which are themselves an MPI program or have complicated scripting to run the model here are some simplified considerations for scripting an experiment. A "cycling script" would need to:

  1. [if needed] Run the ensemble of models forward to the time of the first observation.
  2. The input observation sequence file should be created (or trimmed) to only include observations in the current window.
  3. Run filter to assimilate all observations in the current window.
  4. Save a copy of the output files and diagnostic files. Often a timestamp is used as part of the filename or subdirectory name to make it unique.
  5. Run the ensemble of models forward in time.
  6. Run filter again.
  7. Repeat until all observations have been assimilated.

In More Detail

The filter program requires an ensemble of model output files in NetCDF file format as input. If the model does not use NetCDF a translation step from the model native format to NetCDF is needed. The files are often named with the ensemble number as part of the name and also with a timestamp as part of the filename or part of a subdirectory name which contains all the files for that timestep. Symbolic links can be used to link a common simpler name to a file with a timestamp in the filename or directory name.

The filter program also requires an input observation sequence file. Often these are named with a timestamp to indicate the central time of the observations, e.g. obs_seq.2010-10-04.00:00:00 and then a common name (e.g. obs_seq.out) is used with a symbolic link to indicate the right file for input.

If adaptive inflation is being used the filter program also requires inflation input files. Again, timestamps in the names with a common symbolic link name are often used here.

The filter program runs.

The output of the filter program include updated model files using one of three different workflows:

  1. The filter program directly overwrites the input files.
  2. The script copies the input files to the output names, and the filter program updates the existing files.
  3. The filter program creates new output files from scratch.

The script should also save the obs_seq.final diagnostic file, possibly with a timestamp in the filename or subdirectory name, and the updated inflation files in the case where adaptive inflation is used.

The script can run the ensemble of models forward in time in many ways. A few of the ways we're aware of are:

  1. If a queuing system is available, the ensemble of models can be submitted either as independent jobs or using the batch system's job array syntax. They run as soon as resources are available. The disadvantage is it can be complicated to know when all the jobs have finished successfully.
  2. On smaller clusters the ensemble members can be advanced one after the other in a loop. There is no question about when the last member has been advanced and it requires no more resources than running a single copy of the model. The disadvantage is this is the slowest wall-clock way to advance the ensemble.