Cycling Models
Overview
For simple models which can be advanced by a subroutine call, the
filter program
is driven by the input observation sequence.
It assimilates all observations in the current assimilation
window and then advances the model state until the window
includes the next available observation. When it runs
out of observations, filter exits.
For complex models which are themselves an MPI program
or have complicated scripting to run the model here are some
simplified considerations for scripting an experiment.
A "cycling script" would need to:
- [if needed] Run the ensemble of models forward to the
time of the first observation.
- The input observation sequence file should be created
(or trimmed) to only include observations in the current window.
- Run filter to assimilate all
observations in the current window.
- Save a copy of the output files and diagnostic files.
Often a timestamp is used as part of the filename or
subdirectory name to make it unique.
- Run the ensemble of models forward in time.
- Run filter again.
- Repeat until all observations have been assimilated.
In More Detail
The filter program requires an ensemble of model
output files in NetCDF file format as input.
If the model does not use NetCDF a translation step from
the model native format to NetCDF is needed. The files are
often named with the ensemble number as part of the name
and also with a timestamp as part of the filename or part of
a subdirectory name which contains all the files for that timestep.
Symbolic links can be used to link a common simpler name to a
file with a timestamp in the filename or directory name.
The filter program also requires an
input observation sequence file.
Often these are named with a timestamp to indicate
the central time of the observations,
e.g. obs_seq.2010-10-04.00:00:00
and then a common name (e.g. obs_seq.out) is used with a
symbolic link to indicate the right file for input.
If adaptive inflation is being used the filter program
also requires inflation input files. Again, timestamps in the names with
a common symbolic link name are often used here.
The filter program runs.
The output of the filter program include
updated model files using one of three different workflows:
- The filter program directly overwrites the input files.
- Advantages: uses the least amount of disk usage and minimizes file copying.
- Disadvantages: if something crashes the files can be left in
an indeterminate state making restarting more complicated.
- The script copies the input files to the output names, and the
filter program updates the existing files.
- Advantages: The filter program can easily be
restarted in case of problems because the original input files are unchanged.
The output files are immediately available to be used as input to the model.
- Disadvantages: uses more disk space.
- The filter program creates new output files from scratch.
- Advantages: The output files are smaller since they only contain the
state vector and no other grid or auxiliary information. The
filter program can easily be restarted
in case of problems.
- Disadvantages: generally requires a post-processing step to
insert the updated state information into full model restart files.
The script should also save the obs_seq.final diagnostic
file, possibly with a timestamp in the filename or subdirectory name,
and the updated inflation files in the case where adaptive inflation is used.
The script can run the ensemble of models forward in time in many ways.
A few of the ways we're aware of are:
- If a queuing system is available, the ensemble of models can
be submitted either as independent jobs or using the batch system's
job array syntax. They run as soon as resources are available.
The disadvantage is it can be complicated to know when all the jobs
have finished successfully.
- On smaller clusters the ensemble members can be advanced one after
the other in a loop. There is no question about when the last member
has been advanced and it requires no more resources than running a
single copy of the model. The disadvantage is this is the slowest
wall-clock way to advance the ensemble.