DAReS header

    Welcome to the Data Assimilation Research Testbed - DART


    DART is a community facility for ensemble DA developed and maintained by the Data Assimilation Research Section (DAReS) at the National Center for Atmospheric Research (NCAR). DART provides modelers, observational scientists, and geophysicists with powerful, flexible DA tools that are easy to implement and use and can be customized to support efficient operational DA applications. DART is a software environment that makes it easy to explore a variety of data assimiliation methods and observations with different numerical models and is designed to facilitate the combination of assimilation algorithms, models, and real (as well as synthetic) observations to allow increased understanding of all three. DART includes extensive documentation, a comprehensive tutorial, and a variety of models and observation sets that can be used to introduce new users or graduate students to ensemble DA. DART also provides a framework for developing, testing, and distributing advances in ensemble DA to a broad community of users by removing the implementation-specific peculiarities of one-off DA systems.

    6 frame animation demonstrating the assimilation schematic DART employs a modular programming approach to apply an Ensemble Kalman Filter which nudges the underlying models toward a state that is more consistent with information from a set of observations. Models may be swapped in and out, as can different algorithms in the Ensemble Kalman Filter. The method requires running multiple instances of a model to generate an ensemble of states. A forward operator appropriate for the type of observation being assimilated is applied to each of the states to generate the model's estimate of the observation.

    The DART algorithms are designed so that incorporating new models and new observation types requires minimal coding of a small set of interface routines, and does not require modification of the existing model code. Several comprehensive atmosphere and ocean general circulation models (GCMs) have been added to DART by modelers from outside of NCAR, in some cases with less than one person-month of development effort (Try that with a variational system!). Forward operators for new observation types can be created in a fashion that is nearly independent of the forecast model, many of the standard operators are available 'out of the box' and will work with no additional coding. DART has been through the crucible of many compilers and platforms. It is ready for friendly use and has been used in several field programs requiring real-time forecasting. The DART programs have been compiled with several (many?) Fortran 90 compilers and run on linux compute-servers, linux clusters, OSX laptops/desktops, SGI Altix clusters, supercomputers running AIX ... a pretty broad range, really.



    If you are using DART: you need to know about the changes of 2 June 2008.

    DART code distributions

    DART is distributed primarily through an anonymous-access readonly Subversion (SVN) repository. This makes updates and comparisons between your sandbox and the latest, greatest version of the code trivially easy. The same cannot be said with a TAR file. If you are not familiar with SVN (the client application of subversion), you should take a stroll through my svn primer If you cannot (i.e. are firewalled out) use svn, please let me know and I'll send you a tarfile ... as a last resort.

    Since the DART software is still an area of active research, there are multiple distributions, with more on the way. We'd like to be able to contact people to inform them of any bugs or updates. (This includes local users, BTW!) As you can see by the timetable of distributions, you could expect to get about 2 more emails per year, so PLEASE use a real email address when signing up. I solemnly swear to protect your email address like it is my own!

    download instructions


    version date instructions most noteable change(s)
    trunk today included in distrib. varies, see next section below for details
    jamaica 12 Apr 2007 [doc] vertical localization, extensive testing of MPI implementation, full documentation for new algorithms, new tutorial sections
    change log
    pre_j 02 Oct 2006 [doc] contains an updated scalable filter algorithm
    post_iceland 20 Jun 2006 [doc] observation-space adaptive inflation, bug fixes,
    merge_obs_sequence support ...
    change log
    iceland 23 Nov 2005 [doc] huge expansion of real observation capability
    better namelist processing, PBL_1d available.
    change log
    pre_iceland 20 Oct 2005 for developers only huge expansion of real observation capability
    DA workshop 2005 13 June 2005 docs included in distrib. tutorial directory in distribution, observation preprocessing
    hawaii 28 Feb 2005 [doc] new filtering algorithms
    pre-hawaii 20 Dec 2004 [doc] new filtering algorithms
    guam 12 Aug 2004 [doc] new observation modules, removing autopromotion
    fiji 29 Apr 2004 [doc] enhanced portability, CAM, WRF
    easter 8 March 2004 [doc] initial release


    User-Visible changes to the TRUNK (circa 2 Jun 2008)

    Some ideas are just too good to postpone till the next major release. The following changes to the DART trunk code may require you to take additional action after an update:

    1. Additional function added to the preprocess program also means that you now only need to have obs_def_mod.f90 in your path_names_* files and it will include the code for all other obs_def_* modules automatically. If you have additional obs_def lines for gps, altimeter, or other specialized obs_def files, they must be removed or you will get an error similar to this:
      AMBIGUOUS: Module obs_def_gps_mod is associated with ../../../obs_def/obs_def_gps_mod.f90 as well as ../../../obs_def/obs_def_mod.f90.
      Adding and removing various obs_def files now only involves changing the &preprocess_nml:input_files namelist item in the input.nml file. We will update all path_names_* files in the DART subversion (svn) repository; this only applies to files you have that are not under svn control.

    2. All model work directories now have a quickbuild.csh script instead of workshop_setup.csh. The workshop script still exists in some model directories, but may change in upcoming releases to accomodate DART tutorial workshops. The quickbuild script recompiles all the programs; it does not try to run any of them. By default it builds all small models without MPI, and builds the large models (CAM, WRF, AM2) with MPI. It takes -mpi and -nompi arguments.

    3. The MPI version of filter would sometimes not kill the entire job if one task failed. This resulted in jobs hanging and using up computing resources unnecessarily. We added a new routine to the utilities_mod.f90 called exit_all(). For MPI-enabled jobs it calls MPI_Abort() to inform all other tasks that one has failed. This should cause an MPI job to exit immediately on error. Nothing new happens for programs which do not use MPI. However, this change requires adding the mpi_utilities/null_mpi_utilities_mod.f90 file to many of the path_names_* files where it was not required before. It might also require adding time_manager/time_manager.f90 if it is not already there. We will update all path_names_* files in the DART subversion repository, but you may have programs of your own that use utilities_mod routines - in which case your programs will now need to link with null_mpi_utilities_mod. Without it, the error you see will be similar to this:
      /usr/bin/ld: Undefined symbols:
      _exit_all_

    4. The default mpi interface utility code should now compile as-is for every supported compiler except gfortran. (You should no longer need to go into the code and edit the interface block for the system function.) If you are using the gfortran compiler, in the $DART/mpi_utilities directory you will find a script called fixsystem that you can run to make the file compile without having to edit it by hand.



    Schematic of Ensemble Data Assimilation - from the DAReS Perspective


    This is the DART view of ensemble data assimilation for models that run as separate executables. Starting at the top and working clockwise: Everything is driven by a Fortran namelist and the presence or absence of observations. A Fortran executable named 'filter' reads a namelist, an initial state for the ensemble, and a file containing observations and goes to work. Given the observations and an initial state, 'filter' assimilates the observations and then determines how far to advance the model (using information from the namelist and the observation file). 'filter' forks a shell script to the system and it is this shell script that is responsible for three things: 1) for converting the DART state vectors and 'advance_to_time' to the format required by the underlying model, 2) advancing the model, and 3) converting the model output into a form suitable for 'filter'. [The script is responsible for the lower portion of the diagram.] The model advances each ensemble member (either in turn or all-at-once) and the model output is converted to the input format expected by 'filter'. The shell script finishes and signals 'filter' to continue. We are now back at the beginning and the cycle continues as long as there are observations to assimilate or until the control information in the Fortran namelist is met. When that happens, a set of restart files is written (suitable to continue an experiment with more observations) and diagnostic files are written. These diagnostic files allow for the exploration of the assimilation before and after each assimilation step and for exploration of the assimilation in 'observation space'; each real observation is paired with the estimates of the observation from all of the ensemble members (if desired). Minimally, the ensemble mean estimate of the observation and the ensemble spread of the estimates is recorded.



    DART tested platforms/models/compilers


    The most current version of DART (Jamaica) has been tested and verified to run correctly on the following:

    hardware O/S F90 compiler batch manager
    Intel cluster/SMP/32 bit Red Hat Enterprise Intel 9.0 (build 20051201) none
    Intel cluster/SMP/32 bit Red Hat Enterprise GNU Fortran 95 4.1.0 none
    Intel Nocona SUSE PGI 6.0-8 LSF
    Intel Nocona SUSE Intel 9.1 (EM64T) LSF
    Intel iMac Mac OS X 10.4.6 gfortran 4.2.0 Open-MPI
    IBM Power 5+ AIX 5.3 XLF 10.1.0 LSF
    Intel Fedora Core 4 Intel 9.0 none
    PowerBook G4 (PowerPC) Mac OS X 10.4.8 Absoft Pro Fortran 9.0 none
    PowerBook G4 (PowerPC) Mac OS X 10.3.9 Absoft Pro Fortran 9.0 Lam
    PowerBook G4 (PowerPC) Mac OS X 10.3.9 gfortran 4.2.0 Lam
    Intel cluster (Xeon) Fedora Core 2 PGI 6.0-5 PBS
    Intel cluster (Xeon) Fedora Core 2 PGI 5.2-4 PBS
    Intel cluster (Xeon) Fedora Core 2 Lahey 6.20c PBS
    IBM cluster (of opterons) SUSE Enterprise 8 Pathscale 2.4 LSF
    IBM cluster (of opterons) Suse Enterprise 8 Intel 9.1 (EM64T) LSF
    IBM cluster (of opterons) Suse Enterprise 8 PGI 6.2 LSF
    SGI Altix SUSE Linux ES 9.3 Intel 9.1 PBS
    SGI MIPS IRIX 6.5 MIPSpro 7.4.3m NQS
    Linux cluster Intel Intel 9.1 PBS



    Requirements to install and run DART


    DART is intended to be highly portable among Unix/Linux operating systems. DART has been run successfully on Windows machines under the cygwin environment. Those instructions are under development - if you would like to be a friendly beta-tester please send me (Tim Hoar) an email and I'll send you the instructions, as long as you promise to provide feedback (good or bad!) so I can improve them. My email is   thoar @ ucar . edu - minus the spaces, naturally.

    Minimally, you will need:

    1. a Fortran90 compiler,
    2. the netCDF libraries built with the F90 interface,
    3. perl (just about any version),
    4. an environment that understands csh or tcsh, and
    5. the old unix standby ... make
    History has shown that it is a very good idea to make sure your run-time environment has the following:

    limit stacksize unlimited
    limit datasize unlimited

    Additionally, what has proven to be nice (but not required) is:
    1. ncview: a great visual browser for netCDF files.
    2. the netCDF Operators (NCO): tools to perform operations on netCDF files like concatenating, slicing, and dicing
    3. Some sort of MPI environment. Put another way, DART does not come with MPICH, LAM-MPI, or OpenMPI; but we use them all the time. You should become familiar with the DART MPI introduction in mpi_intro.html.
    4. The DART diagnostic scripts are written for Matlab®. Matlab® R2008B (finally!) has native netCDF support, but the syntax is different enough that I will have to rewrite several hundred (thousand?) calls - which won't happen overnight. Besides, not everyone has that version. If you want to use the diagnostic scripts as they stand, you will need Matlab® along with the third-party netCDF toolboxes: mexnc, snctools, netcdf_toolbox, and the CSIRO netCDF/OPeNDAP interface.


    Requirements: a Fortran90 compiler


    The DART software is predominantly built on several Linux/x86 platforms with several versions of the Intel Fortran Compiler for Linux, which (at one point) is/was free for individual scientific use. It has also been built and successfully run with several versions of each of the following: IBM XL Fortran Compiler, Portland Group Fortran Compiler, Lahey Fortran Compiler, Pathscale Fortran Compiler, GNU Fortran 95 Compiler ("gfortran"), Absoft Fortran 90/95 Compiler (Mac OSX). Since recompiling the code is a necessity to experiment with different models, there are no binaries to distribute.


    Requirements: the netCDF library


    DART uses the netCDF self-describing data format for the results of assimilation experiments. These files have the extension .nc and can be read by a number of standard data analysis tools. In particular, DART also makes use of the F90 interface to the library which is available through the netcdf.mod and typesizes.mod modules. IMPORTANT: different compilers create these modules with different "case" filenames, and sometimes they are not both installed into the expected directory. It is required that both modules be present. The normal place would be in the netcdf/include directory, as opposed to the netcdf/lib directory.

    If the netCDF library does not exist on your system, you must build it (as well as the F90 interface modules). The library and instructions for building the library or installing from an RPM may be found at the netCDF home page: http://www.unidata.ucar.edu/packages/netcdf/

    NOTE: The location of the netCDF library, libnetcdf.a, and the locations of both netcdf.mod and typesizes.mod will be needed later.


    Requirements: if you have your own model


    If you want to run your own model, all you need is an executable and some scripts to interface with DART - we have templates and examples. If your model can be called as a subroutine, life is good, and the hardest part is usually a routine to parse the model state vector into one whopping array - and back. Again - we have templates, examples, and a document describing the required interfaces. That document exists in the DART code - DART/models/model_mod.html - as does all the most current documentation. Every(?) DART program/module has a matching piece of documentation.

    Starting with the Jamaica release, there is an option to compile with the MPI (Message Passing Interface) libraries in order to run the assimilation step in parallel on hardware with multiple CPUs. Note that this is optional; MPI is not required. If you do want to run in parallel, then we also require a working MPI library and appropriate cluster or SMP hardware. See the MPI intro for more information on running with the MPI option.

    One of the beauties of ensemble data assimilation is that even if (particularly if) your model is single-threaded, you can still run efficiently on parallel machines by dealing out each ensemble member (a unique instance of the model) to a separate processor. If your model cannot run single-threaded, fear not, DART can do that too, and simply runs each ensemble member one after another using all the processors for each instance of the model.




    Installing DART : Download the distribution.


    The DART source code is now distributed through an anonymous Subversion server. The big advantage is the ability to patch or update existing code trees at your discretion. Subversion (the client-side app is 'svn') allows you to compare your code tree with one on a remote server and selectively update individual files or groups of files. Furthermore, now everyone has access to any version of any file in the project, which is a huge help for developers. I have a brief summary of the svn commands I use most posted at: http://www.image.ucar.edu/~thoar/svn_primer.html

    The resources to develop and support DART come from our ability to demonstrate our growing user base. We ask that you register at our download site http://www.image.ucar.edu/DAReS/DART/DART_download and promise that the information will only be used to notify you of new DART releases and shown to our sponsers in an aggregated form: "Look - we have three users from Tonawanda, NY". After filling in the form, you will be directed to a website that has instructions on how to download the code.

    svn has adopted the strategy that "disk is cheap". In addition to downloading the code, it downloads an additional copy of the code to store locally (in hidden .svn directories) as well as some administration files. This allows svn to perform some commands even when the repository is not available. It does double the size of the code tree ... so the download is something like 480MB -- pretty big. BUT - all future updates are (usually) just the differences, so they happen very quickly.

    If you follow the instructions on the download site, you should wind up with a directory named my_path_to/DART, which we call $DARTHOME. Compiling the code in this tree (as is usually the case) will necessitate much more space.

    If you cannot use svn, just let me know and I will create a tar file for you. svn is so superior to a tar file that a tar file should be considered a last resort.


    Installing DART : document conventions


    All filenames look like this -- (typewriter font, green).
    Program names look like this -- (italicized font, green).
    user input looks like this -- (bold, magenta).

    commands to be typed at the command line are contained in an indented gray box.

    And the contents of a file are enclosed in a box with a border:
    &hypothetical_nml
      obs_seq_in_file_name = "obs_seq.in",
      obs_seq_out_file_name = "obs_seq.out",
      init_time_days = 0,
      init_time_seconds = 0,
      output_interval = 1
    &end


    Installing DART


    The entire installation process is summarized in the following steps:

    1. Determine which F90 compiler is available.
    2. Determine the location of (or build) the netCDF library.
    3. Download the DART software into the expected source tree.
    4. Modify certain DART files to reflect the available F90 compiler and location of the appropriate libraries.
    5. Build the executables.

    The code tree is very "bushy"; there are many directories of support routines, etc. but only a few directories involved with the customization and installation of the DART software. If you can compile and run ONE of the low-order models, you should be able to compile and run ANY of the low-order models. For this reason, we can focus on the Lorenz `63 model. Subsequently, the only directories with files to be modified to check the installation are: DART/mkmf and DART/models/lorenz_63/work.

    We have tried to make the code as portable as possible, but we do not have access to all compilers on all platforms, so there are no guarantees. We are interested in your experience building the system, so please send us a note at dart @ ucar .edu


    Customizing the build scripts -- Overview.


    DART executable programs are constructed using two tools: make and mkmf. The make utility is a very common piece of software that requires a user-defined input file that records dependencies between different source files. make then performs a hierarchy of actions when one or more of the source files is modified. mkmf is a perl script that generates a make input file (named Makefile) and an example namelist input.nml.program_default with the default values. The Makefile is designed specifically to work with object-oriented Fortran90 (and other languages) for systems like DART.

    mkmf requires two separate input files. The first is a `template' file which specifies details of the commands required for a specific Fortran90 compiler and may also contain pointers to directories containing pre-compiled utilities required by the DART system. This template file will need to be modified to reflect your system. The second input file is a `path_names' file which includes a complete list of the locations (either relative or absolute) of all Fortran90 source files that are required to produce a particular DART program. Each 'path_names' file must contain a path for exactly one Fortran90 file containing a main program, but may contain any number of additional paths pointing to files containing Fortran90 modules. An mkmf command is executed which uses the 'path_names' file and the mkmf template file to produce a Makefile which is subsequently used by the standard make utility.

    Shell scripts that execute the mkmf command for all standard DART executables are provided as part of the standard DART software. For more information on mkmf see the FMS mkmf description.
    One of the benefits of using mkmf is that it also creates an example namelist file for each program. The example namelist is called input.nml.program_default, so as not to clash with any exising input.nml that may exist in that directory.


    Building and Customizing the 'mkmf.template' file


    A series of templates for different compilers/architectures exists in the DART/mkmf/ directory and have names with extensions that identify the compiler, the architecture, or both. This is how you inform the build process of the specifics of your system. Our intent is that you copy one that is similar to your system into mkmf.template and customize it. You can also create a soft link:

    rm mkmf.template
    ln -s mkmf.template.xxxx.yyyy mkmf.template

    For the discussion that follows, knowledge of the contents of one of these templates (i.e. mkmf.template.intel.linux) is needed. Note that only the LAST lines are shown here, the head of the file is just a big comment (worth reading, btw).


    ...
    MPIFC = mpif90
    MPILD = mpif90
    FC = ifort
    LD = ifort
    NETCDF = /usr/local
    INCS = -I${NETCDF}/include
    LIBS = -L${NETCDF}/lib -lnetcdf
    FFLAGS = -O2 $(INCS)
    LDFLAGS = $(FFLAGS) $(LIBS)

    Essentially, each of the lines defines some part of the resulting Makefile. Since make is particularly good at sorting out dependencies, the order of these lines really doesn't make any difference. The FC = ifort line ultimately defines the Fortran90 compiler to use, etc. The lines which are most likely to need site-specific changes start with FC, LD, and NETCDF.

    If you have MPI installed on your system MPIFC, MPILD dictate which compiler will be used in that instance. If you do not have MPI, these variables are of no consequence.

    variablevalue
    FC the Fortran compiler
    LD the name of the loader; typically, the same as the Fortran compiler
    NETCDF the location of your netCDF installation containing netcdf.mod and typesizes.mod. Note that the value of the NETCDF variable will be used by the FFLAGS, LIBS, and LDFLAGS variables.

    Customizing the 'path_names_*' file


    Several path_names_* files are provided in the work directory for each specific model, in this case: DART/models/lorenz_63/work. Since each model comes with its own set of files, the path_names_* files need no customization.


    Building the Lorenz_63 DART project.


    Currently, DART executables are constructed in a work subdirectory under the directory containing code for the given model. In the top-level DART directory, change to the L63 work directory and list the contents:


    cd DART/models/lorenz_63/work
    ls

    With the result:


    dart:~/<1>models/lorenz_63/work > ls
    filter_ics                     obs_seq.in                           path_names_perfect_model_obs
    filter_restart                 obs_seq.out                          path_names_preprocess
    input.nml                      obs_seq.out.average                  path_names_wakeup_filter
    mkmf_create_fixed_network_seq  obs_seq.out.x                        perfect_ics
    mkmf_create_obs_sequence       obs_seq.out.xy                       perfect_restart
    mkmf_filter                    obs_seq.out.xyz                      Posterior_Diag.nc
    mkmf_obs_diag                  obs_seq.out.z                        Prior_Diag.nc
    mkmf_obs_sequence_tool         path_names_create_fixed_network_seq  quickbuild.csh
    mkmf_perfect_model_obs         path_names_create_obs_sequence       set_def.out
    mkmf_preprocess                path_names_filter                    True_State.nc
    mkmf_wakeup_filter             path_names_obs_diag                  workshop_setup.csh
    obs_seq.final                  path_names_obs_sequence_tool
    0[537] dart:~/<1>models/lorenz_63/work >
    

    There are eight mkmf_xxxxxx files for the programs

    1. preprocess,
    2. create_obs_sequence,
    3. create_fixed_network_seq,
    4. perfect_model_obs,
    5. filter,
    6. wakeup_filter,
    7. merge_obs_seq, and
    8. obs_diag,

    along with the corresponding path_names_xxxxxx files. There are also files that contain initial conditions, netCDF output, and several observation sequence files, all of which will be discussed later. You can examine the contents of one of the path_names_xxxxxx files, for instance path_names_filter, to see a list of the relative paths of all files that contain Fortran90 modules required for the program filter for the L63 model. All of these paths are relative to your DART directory. The first path is the main program (filter.f90) and is followed by all the Fortran90 modules used by this program (after preprocessing).

    The mkmf_xxxxxx scripts are cryptic but should not need to be modified -- as long as you do not restructure the code tree (by moving directories, for example). The only function of the mkmf_xxxxxx script is to generate a Makefile and an input.nml.program_default file. It is not supposed to compile anything -- make does that:


    csh mkmf_preprocess
    make

    The first command generates an appropriate Makefile and the input.nml.preprocess_default file. The second command results in the compilation of a series of Fortran90 modules which ultimately produces an executable file: preprocess. Should you need to make any changes to the DART/mkmf/mkmf.template, you will need to regenerate the Makefile.

    The preprocess program actually builds source code to be used by all the remaining modules. It is imperative to actually run preprocess before building the remaining executables. This is how the same code can assimilate state vector 'observations' for the Lorenz_63 model and real radar reflectivities for WRF without needing to specify a set of radar operators for the Lorenz_63 model!

    preprocess reads the &preprocess_nml namelist to determine what observations and operators to incorporate. For this exercise, we will use the values in input.nml. preprocess is designed to abort if the files it is supposed to build already exist. For this reason, it is necessary to remove a couple files (if they exist) before you run the preprocessor. It is just a good habit to develop.


    \rm -f ../../../obs_def/obs_def_mod.f90
    \rm -f ../../../obs_kind/obs_kind_mod.f90
    ./preprocess
    ls -l ../../../obs_def/obs_def_mod.f90
    ls -l ../../../obs_kind/obs_kind_mod.f90

    This created ../../../obs_def/obs_def_mod.f90 from ../../../obs_kind/DEFAULT_obs_kind_mod.F90 and several other modules. ../../../obs_kind/obs_kind_mod.f90 was created similarly. Now we can build the rest of the project.

    A series of object files for each module compiled will also be left in the work directory, as some of these are undoubtedly needed by the build of the other DART components. You can proceed to create the other programs needed to work with L63 in DART as follows:


    csh mkmf_create_obs_sequence
    make
    csh mkmf_create_fixed_network_seq
    make
    csh mkmf_perfect_model_obs
    make
    csh mkmf_filter
    make
    csh mkmf_obs_diag
    make

    The result (hopefully) is that six executables now reside in your work directory. The most common problem is that the netCDF libraries and include files (particularly typesizes.mod) are not found. Find them, edit the DART/mkmf/mkmf.template to point to their location, recreate the Makefile, and try again. The next most common problem is from the gfortran compiler complaining about "undefined reference to `system_'" which will be covered in the Platform-specific notes section.


    programpurpose
    preprocess creates custom source code for just the observations of interest
    create_obs_sequence specify a (set) of observation characteristics taken by a particular (set of) instruments
    create_fixed_network_seq specify the temporal attributes of the observation sets
    perfect_model_obs spinup, generate "true state" for synthetic observation experiments, ...
    filter perform experiments
    obs_diag creates observation-space diagnostic files to be explored by the Matlab® scripts.
    merge_obs_sequence manipulates observation sequence files. It is not generally needed (particularly for low-order models) but can be used to combine observation sequences or convert from ASCII to binary or vice-versa. Since this is a specialty routine - we will not cover its use in this document.
    wakeup_filter is only needed for MPI applications. We're starting at the beginning here, so we're going to ignore this one, too.

    Checking the build.

    The DART/tutorial documents are an excellent way to kick the tires on DART and learn about ensemble data assimilation. If you've been able to build the Lorenz 63 model, you have correctly configured your mkmf.template and you can run anything in the tutorial.




    Platform-specific notes.


    Most of the platform-specific notes are in the appropriate mkmf.template.xxxx.yyyy file. There are very few situations that require making additional changes.


    gfortran

    For some reason, the gfortran compiler does not require an interface to the system() routine while all the other compilers we have tested do need the interface. This makes it impossible to have a module that is compiler-independent. The interface is needed in the null_mpi_utilities_mod.f90, and/or mpi_utilities_mod.f90. The problem surfaces at link time :

    
    null_mpi_utilities_mod.o(.text+0x160): In function `__mpi_utilities_mod__shell_execute':
    : undefined reference to `system_'
    null_mpi_utilities_mod.o(.text+0x7c8): In function `__mpi_utilities_mod__destroy_pipe':
    : undefined reference to `system_'
    null_mpi_utilities_mod.o(.text+0xbb9): In function `__mpi_utilities_mod__make_pipe':
    : undefined reference to `system_'
    collect2: ld returned 1 exit status
    make: *** [preprocess] Error 1
    
    
    There is a script to facilitate making the appropriate change to null_mpi_utilities_mod.f90 and mpi_utilities_mod.f90. Run the shell script DART/mpi_utilities/fixsystem with no arguments to simply 'flip' the state of these files (i.e. if the system block is defined, it will undefine the block by commenting it out; if the block is commented out, it will define it by uncommenting the block). If you want to hand-edit null_mpi_utilities_mod.f90 and mpi_utilities_mod.f90 - look for the comment block that starts ! BUILD TIP and follow the directions in the comment block.


    module mismatch errors

    Compilers create modules in their own particular manner ... a module built by one compiler may not (will usually not) be useable by another compiler. Sometimes it happens that the Fortran90 modules for the netCDF interface compiled by compiler A is trying to be used by compiler B. This generally results in an error message like:

    
    Fatal Error: File 'netcdf.mod' opened at (1) is not a <pick_your_compiler> module file
    make: *** [utilities_mod.o] Error 1
    
    
    The only solution here is to make sure the mkmf.template file is referencing the appropriate netCDF installation.


    endian-ness errors

    The endian-ness of the binary files is specific to the chipset, not the compiler or the code (normally). There are some models that require a specific endian binary file. Most compilers have some sort of ability to read and/or write binary files of a specific (or non-native) endianness by throwing some compile flags. It is generally an 'all-or-nothing' approach in that trying to micromanage which files are opened with native endianness and which files are openened with the non-native endianness is generally too time-consuming and fraught with error to be of much use. If the compile flags exist and are known to us, we try to include them in the comment section of the individual mkmf.template.xxxx.yyyy file.

    These errors most often manifest themselves as 'time' errors in the DART execution. The restart/initial conditions files have the valid time of the ensuing model state as the first bit of information in the header, and if these files are 'wrong'-endian, the encoded times are nonsensical.


    MPI

    If you want to use MPI and are interested in testing something simple before total immersion: try running the MPI test routines in the DART/doc/mpi directory. This directory contains some small test programs which use both MPI and the netCDF libraries. It may be simpler to debug any build problems here, and if you need to submit a problem report to your system admin people these single executables are much simpler than the entire DART build tree.




    Was the (simple, nonMPI) Install successful?


    In keeping with the 'start simple' philosophy - we will test the installation with the lorenz_63 model. This section is not intended to provide any details of why we are doing what we are doing - this is sort of a 'black-box' test.

    In the DART/models/lorenz_63/work directory, there is a shell script named workshop_setup.csh. It will build all the executables for this model and run a simple perfect model experiment. The initial conditions files and observations sequences are in ASCII, so there is no portability issue, but there may be some roundoff error in the conversion from ASCII to machine binary. With such a highly nonlinear model, small differences in the initial conditions will result in a different model trajectory. Your results should start out looking VERY SIMILAR and may diverge with time.

    The canned experiment starts from a single known state and advances the model to the times defined in obs_seq.in and applies a forward operator to calculate the model's estimate of the observation. Noise with specified characteristics is added to this value. Both the noise-free and noisy version of the observation are recorded in obs_seq.out. There are 200 time steps in this experiment - the temporal evolution of the model state (the model 'truth') is recorded in a netCDF file.

    The second part of this experiment is the actual assimilation. The ensemble of initial states is contained in filter_ics. There are initial states for 80 ensemble members in this file (our canned example will only use the first 20). Each model state is advanced until the time of one of the observations in obs_seq.out and the observation is assimilated. Each ensemble member has its own estimate of what it thinks the observation should be - these estimates are recorded in obs_seq.final. As the ensemble members advance, all of the state information is recorded in a pair of netCDF files.

    cd DART/models/lorenz_63/work
    ./workshop_setup.csh

    Here is a list of the files that are created by workshop_setup.csh:


    from executable "perfect_model_obs"
    obs_seq.out the observations at some predefined times and locations
    True_State.nc a netCDF file containing the model trajectory
    perfect_restart the final state of the model - in ASCII
     
    from executable "filter"
    Prior_Diag.nc the model states right before assimilation
    Posterior_Diag.nc the model states immediately after assimilation
    obs_seq.final the model estimates of the observations (an integral part of the data assimilation process)
    filter_restart the ensemble of final model states
     
    from both
    dart_log.out the 'important' run-time output (this grows with each execution)
    dart_log.nml the input parameters used for an experiment

    If you have Matlab with netCDF support


    The simplest way to determine if the installation is successful is to run some of the functions we have available in DART/matlab/. Usually, we launch Matlab from the DART/models/lorenz_63/work directory and use the Matlab addpath command to make the DART/matlab/ functions available. In this case, we know the true state of the model that is consistent with the observations. The following Matlab scripts compare the ensemble members with the truth and can calculate an error.

    cd DART/models/lorenz_63/work
    matlab
    ... (lots of startup messages I'm skipping)...
    >> addpath ../../../matlab
    >> plot_total_err % no input arguments needed
    ... (some output I'm skipping) ...
    >> plot_ens_time_series % again - no input arguments needed

    plot_total_err plot_ens_time_series
    From the plot_ens_time_series graphic, you can see the individual green ensemble members getting more constrained as time evolves. If your figures look similar to these, that's pretty much what you're looking for and you should feel pretty confident that everything is working.


    If you have ncview


    It is possible to get a glimpse of the evolution of the experiment by using ncview, but it is not really made for comparing the contents of two netCDF files against one another (as is done in the Matlab scripts). All you can really do is to check that the system is actually evolving as one would expect.


    ncview Prior_Diag.nc
    results in the following - which I call the navigation window.
    click on 'state' - this will bring up the image to the right. With the mouse, drift over the lower-left portion of the image and notice that the 'Current:' portion of the navigation window tracks your cursor. For what it's worth - in this view the bottom row is the ensemble mean, the next row up is the ensemble spread, the third row is ensemble member 1 ... From left-to-right, there are three values - one for each model variable. You can think of them as X-Y-Z.
    Position the cursor on the lower-left portion until you get 'Current(:i=0,j=0) #### (x=1,y=1)' and click. This should spawn a timeseries in another window:
    repeat until you get 'Current(:i=1,j=0) #### (x=2,y=1)' and click -
    another timeseries is superimposed on the figure, if you got x=2, this is the second of the 3 state variables of the Lorenz '63 system.
    repeat until you get 'Current(:i=2,j=0) #### (x=3,y=1)' and click
    If you like, you can 'Close' that window and navigate to 'Current(:i=0,j=1) #### (x=1,y=2)' and click to check the spread of the ensemble

    If you have ... neither of those ...


    You are going to have to plot the values from the netCDF on your own. If you are not familiar with the netCDF format - now's the time. It's a wonderfully self-describing format. What you need to know is that the True_State.nc file contains exactly 1 'copy' - the true state of the model as it was used to generate the observations. There are many 'copies' in the Prior_Diag.nc and Posterior_Diag.nc files - all of the ensemble members, the ensemble mean, ensemble spread, and a couple more that pertain to parameters associated with the assimilation. The netCDF files have all the information in them to decode the information about each 'copy'.


    models/lorenz_63/work > ncdump -v CopyMetaData True_State.nc
    netcdf True_State {
    dimensions:
            metadatalength = 10 ;
            locationrank = 1 ;
            copy = 1 ;
            time = UNLIMITED ; // (200 currently)
            NMLlinelen = 129 ;
            NMLnlines = 187 ;
            StateVariable = 3 ;
    variables:
            int copy(copy) ;
                    copy:long_name = "ensemble member or copy" ;
                    copy:units = "nondimensional" ;
                    copy:valid_range = 1, 1 ;
            char CopyMetaData(copy, metadatalength) ;
                    CopyMetaData:long_name = "Metadata for each copy/member" ;
            char inputnml(NMLnlines, NMLlinelen) ;
                    inputnml:long_name = "input.nml contents" ;
            double time(time) ;
                    time:long_name = "time" ;
                    time:axis = "T" ;
                    time:cartesian_axis = "T" ;
                    time:calendar = "no calendar" ;
                    time:units = "days since 0000-00-00 00:00:00" ;
            double loc1d(StateVariable) ;
                    loc1d:long_name = "location on unit circle" ;
                    loc1d:dimension = 1 ;
                    loc1d:units = "nondimensional" ;
                    loc1d:valid_range = 0., 1. ;
            int StateVariable(StateVariable) ;
                    StateVariable:long_name = "State Variable ID" ;
                    StateVariable:units = "indexical" ;
                    StateVariable:valid_range = 1, 3 ;
            double state(time, copy, StateVariable) ;
                    state:long_name = "model state or fcopy" ;
    
    // global attributes:
                    :title = "true state from control" ;
                    :assim_model_source = "$URL: /DAReS/DART/trunk/assim_model/assim_model_mod.f90 $" ;
                    :assim_model_revision = "$Revision: 3766 $" ;
                    :assim_model_revdate = "$Date: 2009-02-06 13:13:09 -0700 (Fri, 06 Feb 2009) $" ;
                    :creation_date = "YYYY MM DD HH MM SS = 2009 02 18 13 27 58" ;
                    :model_source = "$URL: /DAReS/DART/trunk/models/lorenz_63/model_mod.f90 $" ;
                    :model_revision = "$Revision: 3440 $" ;
                    :model_revdate = "$Date: 2008-07-01 17:07:15 -0600 (Tue, 01 Jul 2008) $" ;
                    :model = "Lorenz_63" ;
                    :model_r = 28. ;
                    :model_b = 2.6666666666667 ;
                    :model_sigma = 10. ;
                    :model_deltat = 0.01 ;
    data:
    
     CopyMetaData =
      "true state" ;
    }

    The only real difference between the Prior_Diag.nc (or Posterior_Diag.nc) and the True_State.nc is that the former has more 'copies'. If you dump the 'CopyMetaData' variable, here they are - listed in the order they appear in the netCDF file. Take a look at the shape of the 'state' variable in the preceeding ncdump: state(time, copy, StateVariable). 200 timesteps - 1 copy - 3 state variables. The other two netCDF files have more copies, so you need to know how to index them to retrieve the copy of interest. Simply dump the 'CopyMetaData' variable


    models/lorenz_63/work > ncdump -v CopyMetaData Posterior_Diag.nc
    netcdf Posterior_Diag {
    ...
     CopyMetaData = 
      "ensemble mean        ",
      "ensemble spread      ",
      "ensemble member      1",
      "ensemble member      2",
      "ensemble member      3",
      "ensemble member      4",
      "ensemble member      5",
      "ensemble member      6",
      "ensemble member      7",
      "ensemble member      8",
      "ensemble member      9",
      "ensemble member     10",
      "ensemble member     11",
      "ensemble member     12",
      "ensemble member     13",
      "ensemble member     14",
      "ensemble member     15",
      "ensemble member     16",
      "ensemble member     17",
      "ensemble member     18",
      "ensemble member     19",
      "ensemble member     20",
      "inflation mean        ",
      "inflation sd          " ;
    }

    The 22nd copy is ensemble member 20, for example. So - using pseudo-syntax that assumes you start counting with '1': state(:,22,:) is a 200-by-3 matrix for ensemble member 20. Each row is a timestep, each column is a state variable. This is a 3 variable model. Want to know the time indices? There is a 'time' variable - complete with units.





    Did my experiment work?

    A.K.A. - it didn't obviously bomb, now what?


    Under construction ...

    Check for deficient ensemble spread -
    Check for large errors compared to the observations -
    Check against the Prior (not the Posterior) -
    Check the magnitude of the innovations -


    MPI: Check to make sure there are no orphans, lingering processes.


    from executable "perfect_model_obs"
    obs_seq.out the observations at some predefined times and locations
    True_State.nc a netCDF file containing the model trajectory
    perfect_restart the final state of the model - in ASCII
     
    from executable "filter"
    Prior_Diag.nc the model states right before assimilation
    Posterior_Diag.nc the model states immediately after assimilation
    obs_seq.final the model estimates of the observations (an integral part of the data assimilation process)
    filter_restart the ensemble of final model states
     
    from both
    dart_log.out the 'important' run-time output (this grows with each execution)
    dart_log.nml the input parameters used for an experiment

    If you have Matlab with netCDF support


    Under construction ...

    the state-space-diagnostics

    plot_total_err
    plot_ens_spread
    plot_bins
    ...

    the observation-space-diagnostics

    obs_diag

    What you can do with NCO, ncview


    Under construction.





    DART_adding_a_model.html

    Adding a model to DART - Overview

    DART is designed to work with many models without modifications to the DART routines or the model source code. DART can 'wrap around' your model in two ways. One can be used if your model can be called as a subroutine, the other is for models that are separate executables. Either way, there are some steps that are common to both paths.

    Please be aware that several of the high-order models (CAM and WRF, in particular) have been used for years and their scripts have incorporated failsafe procedures and restart capabilities that have proven to be useful but make the scripts complex - more complex than need be for the initial attempts. Truly, some of the complexity is no longer required for available platforms. Then again, we're not running one instance of a highly complicated computer model, we're running N of them.

    The basic steps to include your model in DART

    are these (much more detail is provided later):

    1. Copy the template directory and files to your own DART model directory.
    2. Modify the model_mod.f90 file to return specifics about your model. This module MUST contain all the required interfaces (no surprise) but it can also contain many more interfaces as is convenient.
    3. [optional step] Modify the matlab routines to know about the specifics of the netCDF files produces by your model (sensible defaults, for the most part.)


    4. If your model is not subroutine-callable, there is extra work to be done. There are several examples to raid, but it helps to know which existing model has a strategy that is most similar to yours. More on that later.

    5. Modify shell_scripts/advance_model.csh to: collect all the input files needed to advance the model into a clean, temporary directory, convert the state vector file into input to your model, run your model, and convert your model output to the expected format for another assimilation by DART. We have examples - some that use the following support routines.
      1. Create a routine or set of routines to take a DART array and create input files for your model. This is frequently done by a program called trans_sv_pv.f90, but you can do it any way you like. It is strongly suggested that you use the DART read mechanism for the restart file - namely assim_model_mod.f90:aread_state_restart()
      2. Modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. It may be convenient to do this in trans_sv_pv.f90.
      3. Run the model (you may need to watch the MPI syntax)
      4. Create a routine or set of routines to take your model output files and create a DART restart file. This is frequently done by a program called trans_pv_sv.f90. It is strongly suggested that you use the DART write mechanism for the restart file - namely assim_model_mod.f90:awrite_state_restart()

    6. If a single instance of your model needs to advance using all the MPI tasks, there is one more script that needs to work - run_filter.csh.

    7. Modify shell_scripts/run_filter.csh to: do everything under the sun and then some

    Test ...

    Generally, it is a good strategy to use DART to create a synthetic observation sequence with ONE observation location - and ONE observation type - for several assimilation periods. With that, it is possible to run perfect_model_obs and then filter without having to debug too much stuff at once. A separate document will address how to test your model with DART.

    Programming style

    #1 Don't shoot the messenger. We have a lot of experience trying to write portable/reproducible code and offer these suggestions. All of these suggestions are for the standalone DART components. We are not asking you to rewrite your model. If your model is a separate executable, leaving it untouched is fine. Writing portable code for the DART components will allow us to include your model in the nightly builds and reduces the risk of us making changes that adversely affect the integration with your model. There are some routines that have to play with the core DART routines, these are the ones we are asking you to write using these few simple guidelines.

    • Use explicit typing, do not throw the 'autopromote' flag on your compiler.
    • Use the intent() attribute.
    • Use the use, xxx_mod, only : bob, sally statements for routines from other modules. This really helps us track down things and ensures you're using what you think you're using.
    • Use Fortran namelists for I/O if possible.
    • Check out the existing parameters/routines in common/types_mod.f90, utilites/utilities_mod.f90, and time_manager/time_manager_mod.f90. You are free to use these and are encouraged to do so. No point reinventing the wheel and these routines have been tested extensively.
    Hopefully, you have no idea how difficult it is to build each model with 'unique' compile options on N different platforms. Fortran90 provides a nice mechanism to specify the type of variable, please do not use vendor-specific extensions. (To globally autopromote 32bit reals to 64bit reals, for example. That is a horrible thing to do, since vendors are not consistent about what happens to explicitly-typed variables. Trust me. They lie. It also defeats the generic procedure interfaces that are designed to use a single interface as a front-end to multiple 'type-specific' routines.) Compilers do abide by the standard, however, so DART code looks like:

    character(len=8) :: crdate
    integer, dimension(8) :: values
    ...
    real(r4) :: a,b
    real(r8) :: bob
    integer :: istatus, itype
    ...
    real(r8), intent(in) :: x(:)
    type(location_type), intent(in) :: location
    integer, intent(in) :: itype
    integer, intent(out) :: istatus
    real(r8), intent(out) :: obs_val

    depending on the use. The r4 and r8 types are explicitly defined in DART/common/types_mod.f90 to accurately represent what we have come to expect from 32bit and 64bit floating point real variables, respectively. If you like, you can redefine r8 to be the same as r4 to shrink your memory requirement. The people who run with WRF frequently do this. Do not redefine the digits12 parameter, that one must provide 64bit precision, and is used in precious few places.

    Adding a model to DART - Specifics

    If your model is a separate executable, there is some flexibility to provide the required interfaces and it would be wise to look at the heavily commented template script DART/models/templates/shell_scripts/advance_model.csh and then a few higher-order models to see how they do it. Become familiar with DART/doc/html/mpi_intro.html (DART's use of MPI), DART/doc/html/filter_async_modes.html, and the filter namelist parameter async - in filter.html.

    1. Copying the template directory

    A little explanation/motivation is warranted. If the model uses the standard layout, it is much easier to include the model in the nightly builds and testing. For this reason alone, please try to use the recommended directory layout. Simply looking at the DART/models directory should give you a pretty good idea of how things should be laid out. Copy the template directory and its contents. It is best to remove the (hidden) subversion files to keep the directory 'clean'. The point of copying this directory is to get a model_mod.f90 that works as-is and you can modify/debug the routines one at a time.

    ~/DART/models % cp -r template mymodel
    ~/DART/models % find mymodel -name .svn -print
    mymodel/.svn
    mymodel/shell_scripts/.svn
    mymodel/work/.svn
    ~/DART/models % rm -rf `find mymodel -name .svn -print`
    ~/DART/models % find mymodel -name .svn -print
    ~/DART/models %

    The destination directory (your model directory) should be in the DART/models directory to keep life simple. Moving them around will cause problems for the work/mkmf_xxxxx configuration files. Each model directory should have a work and shell_scripts directories, and may have a matlab directory, a src directory, or anything else you may find convenient.

    Now, you must change all the work/path_names_xxx file contents to reflect the location of your model_mod.f90.

    2. model_mod.f90

    We have templates, examples, and a document describing the required interfaces in the DART code tree - DART/models/model_mod.html. Every(?) user-visible DART program/module has a matching piece of documentation that is distributed along with the code. The DART code tree always has the most current documentation.

    Check out time_manager_mod.f90 and utilities_mod.f90 for general-purpose routines ...

    Use Fortran namelists for I/O if possible.

    Modify the model_mod.f90 file to return specifics about your model. This module MUST contain all the required interfaces (no surprise) but it can also contain many more interfaces as is convenient. This module should be written with the understanding that print statements and error terminations will be executed by multiple processors/tasks. To restrict print statements to be written once (by the master task), it is necessary to preface the print as in this example:
    if (do_output()) write(*,*)'model_mod:namelist cal_NML',startDate_1,startDate_2

    Required Interfaces in model_mod.f90

    No matter the complexity of the model, the DART software requires a few interface routines in a model-specific Fortran90 module model_mod.f90 file. The models/template/model_mod.f90 file has extended comment blocks at the heads of each of these routines that go into much more detail for what is to be provided. You cannot change the types or number of required arguments to any of the required interface routines. You can add optional arguments, but you cannot go back throught the DART tree to change the gazillion calls to the mandatory routines. It is absolutely appropriate to look at existing models to get ideas about how to implement the interfaces. Finding a model implementation that is functionally close to yours always helps.

    As of December 2008, the table of the mandatory interfaces and programming degree-of-difficulty is:

    subroutine callable separate executable routine description
    easyeasy get_model_size This function returns the size of all the model variables (prognostic or diagnosed or ...) that are packed into the 1D DART state vector. That is, it returns the length of the DART state vector as a single scalar integer.
    dependstrivial adv_1step For subroutine-callable models, this routine is the one to actually advance the model 1 timestep (see models/bgrid_solo/model_mod.f90 for an example). For non-subroutine-callable models, this is a NULL interface. Easy.
    dependsdepends get_state_meta_data This routine takes as input an integer into the DART state vector and returns the associated location and (optionally) variable type from obs_kind/obs_kind_mod.f90. (See models/*/model_mod.f90 for examples.) This generally requires knowledge of how the model state vector is packed into the DART array, so it can be as complicated as the packing.
    dependshard model_interpolate This is one of the more difficult routines. Given a DART state vector, a location, and a desired generic 'kind' (like KIND_SURFACE_PRESSURE, KIND_TEMPERATURE, KIND_SPECIFIC_HUMIDITY, KIND_PRESSURE, ... ); return the desired scalar quantity and set the return status accordingly. This is what enables the model to use observation-specific 'forward operators' that are part of the common DART code.
    easyeasy get_model_time_step This routine returns the smallest increment in time (in seconds) that the model is capable of advancing the state in a given implementation. For example, the dynamical timestep of a model is 20 minutes, but there are reasons you don't want to (or cannot) restart at this interval and would like to restart AT MOST every 6 hours. For this case, get_model_time_step should return 21600, ie 6*60*60. This is also interpreted as the nominal assimilation period. This interface is required for all applications.
    easyeasy end_model Performs any shutdown and cleanup needed. Good form would dictate that you should deallocate any storage allocated when you instantiated the model (from static_init_model, for example).
    dependsdepends static_init_model Called to do one-time initialization of the model. This generally includes setting the grid information, calendar, etc.
    trivialtrivial init_time Returns a time that is somehow appropriate for starting up a long integration of the model IFF the namelist parameter start_from_restart = .false. for the program perfect_model_obs. If this option is not to be used in perfect_model_obs, this can be a NULL interface.
    easyeasy init_conditions Companion interface to init_time. Returns a model state vector that is somehow appropriate for starting up a long integration of the model. Only needed IFF the namelist parameter start_from_restart = .false. for the program perfect_model_obs.
    trivial-difficult trivial-difficult nc_write_model_atts This routine is used to write the model-specific attributes to the netCDF files containing the prior and posterior states of the assimilation. The subroutine in the models/template/model_mod.f90 WILL WORK for new models but does not know anything about prognostic variables or geometry or ... Still, it is enough to get started without doing anything. More meaningful coordinate variables etc. are needed to supplant the default template. This can be as complicated as you like - see existing models for examples.
    trivial-difficult trivial-difficult nc_write_model_vars This routine is responsible for writing the DART state vector -or- the prognostic model variables to the output netCDF files. If the namelist parameter output_state_vector == .false. this routine is responsible for partitioning the DART state vector into the appropriate netCDF pieces (i.e. the prognostic model variables). The default routine will simply blast out the entire DART state vector into a netCDF variable called 'state'.
    dependstrivial-difficult pert_model_state This routine is used to generate initial ensembles. This may be a NULL interface if you can tolerate the default perturbation strategy of adding noise to every state element or if you generate your own ensembles outside the DART framework. There are other ways of generating ensembles ... climatological distributions, bred singular vectors, voodoo ...
    trivialtrivial get_close_maxdist_init This routine performs the initialization for the table-lookup routines that accelerate the distance calculations. This routine is closely tied to the type of location module used by the model and is frequently (universally?) simply a 'pass-through' routine to a routine of the same name in the location module. There is generally no coding that needs to be done, but the interface must exist in model_mod.f90
    trivialtrivial get_close_obs_init This routine performs the initialization for the get_close accelerator that depends on the particular observation. Again, this is generally a 'pass-through' routine to a routine of the same name in the location module.
    trivialtrivial get_close_obs This is the routine that takes a single location and a list of other locations, returns the indices of all the locations close to the single one along with the number of these and the distances for the close ones. Again, this is generally a 'pass-through' routine to a routine of the same name in the location module.
    easyeasy ens_mean_for_model This routine simply stores a copy of the ensemble mean of the state vector within the model_mod. The ensemble mean may be needed for some calculations (like converting model sigma levels to the units of the observation - pressure levels, for example).

    3. providing matlab support

    Since this is an optional step, it will be covered in a separate document.

    If your model is subroutine-callable - you're done!




    The Big Picture for models advanced as separate executables.

    The normal sequence of events is that DART reads in its own restart file (do not worry about where this comes from right now) and eventually determines it needs to advance the model. DART needs to be able to take its internal representation of each model state vector, the valid time of that state, and the amount of time to advance the state - and communicate that to the model. When the model has advanced the state to the requested time, the output must be ingested by DART and the cycle begins again. DART is entirely responsible for reading the observations and there are several programs for creating and manipulating the observation sequence files.

    There are a couple of ways to exploit parallel architectures with DART, and these have an immediate bearing on the design of the script(s) that control how the model instances (each model copy) are advanced. Perhaps the conceptually simplest method is when each model instance is advanced by a single processor element. DART calls this async = 2. It is generally efficient to relate the ensemble size to the number of processors being used.

    The alternative is to advance every model instance one after another using all available processors for each instance of the model. DART calls this async = 4, and requires an additional script. For portability reasons, DART uses the same processor set for both the assimilation and the model advances. For example, if you advance the model with 96 processors, all 96 processors will be employed to assimilate.




    4. advance_model.csh

    This script is invoked in one of two ways: 1) if filter uses a system() call, or 2) if run_filter.csh makes the call. Either way there are three arguments.

    1. the process number of the caller - could be the master task ID (zero) or (especially if async = 2) a process id that gets related to the copy. When multiple copies are being advanced simultaneously, each of the advances happens in its own run-time directory.
    2. the number of state copies belonging to that process
    3. the name of the (ASCII) filter_control_file for that process. The filter_control file contains the following information (one per line): the ensemble member, the name of the input file (containing the DART state vector), and the name of the output file from the model containing the new DART state vector. For example,
      1
      assim_model_state_ic.0001
      assim_model_state_ud.0001
      2
      assim_model_state_ic.0002
      assim_model_state_ud.0002
      ...

    async = 2 ... advancing many copies at the same time

    Modify shell_scripts/advance_model.csh to:

    1. Collect all the input files needed to advance the model into a clean, temporary directory.
    2. Determine how many tasks you have, and how many ensemble members you have. Determine how many 'batches' of ensemble members must be done to advance all of them. With 20 tasks and 80 ensemble members, you will need to loop 4 times, for example. clean, temporary directory
    3. and loop over the following three steps - each loop advances one ensemble member
    4. convert the DART state vector file into input for your model,
    5. run your model, and
    6. convert your model output to the file with the expected format for another assimilation by DART.
    During this initial phase, it may be useful to _leave_ the temporary directory

    async = 4 ... advancing each copy one at a time

    In addition to modifying shell_scripts/advance_model.csh as described above, you must also modify shell_scripts/run_filter.csh in the following way: THIS PART NEEDS TO BE FILLED IN

    5. Converting DART output to input for your model.

    Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. Frequently this is also done by trans_sv_pv.f90. The DART array has a header that contains the 'advance-to' time as well as the 'valid' time of the DART array. The times are encoded in DART's time representation. Interpretation of these times by your model will be necessary and you will need to feed these times to your model in an automated fashion.

    need to create and modify a mkmf_trans_sv_pv and path_names_trans_sv_pv ...

    6. Modify the start/stop control of your model run.

    Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. Frequently this is also done by trans_sv_pv.f90. The DART array has a header that contains the 'advance-to' time as well as the 'valid' time of the DART array. The times are encoded in DART's time representation. Interpretation of these times by your model will be necessary and you will need to feed these times to your model in an automated fashion.

    7. Convert your model output to DART input.

    Create a routine or set of routines to modify the input to your model communicating the run-time settings necessary to integrate your model from one time to another arbitrary time in the future. Frequently this is also done by trans_sv_pv.f90. The DART array has a header that contains the 'advance-to' time as well as the 'valid' time of the DART array. The times are encoded in DART's time representation. Interpretation of these times by your model will be necessary and you will need to feed these times to your model in an automated fashion.


    Adding observations

    Under Construction.
    ...


    Configuring Matlab to work for DART

    Section Under Construction.

    Configuring Matlab to work for DART


    MatlabR2008B is the first version of matlab to have native netCDF support. Since the historical third-party support for netCDF consisted of a layer of 'high-level' functions wrapped around a layer of 'low-level' functions, it is possible to change the 'low-level' functions without changing the user-visible 'high-level' functions. Indeed, the mexcdf group at SourceForge has done exactly this. You simply need to know which version of Matlab you are using to download and install the proper set of high-level functions. Just to take the mystery out of it, the mexnc download is what I'm calling the 'low-level' functions, and while you can call them directly, the snctools and netcdf_toolbox downloads provide functions that look and feel like those available from the C and Fortran library interfaces that are familiar to programmers familiar with netCDF.

    Find your version of Matlab (type 'ver' at the Matlab prompt) and visit http://mexcdf.sourceforge.net/downloads to get the right combination of mexnc, snctools, and the NetCDF Toolbox. Then - you will also need the CSIRO set of functions containing 'getnc'.

    The Matlab netcdf_toolbox subset of functions has been deprecated by their developers, who are now supporting the snctools set of functions. The Matlab function read_obs_netcdf uses the supported snctools toolbox. A slow migration away from the existing DART use of the netcdf_toolbox and the CSIRO toolbox matlab_netCDF_OPeNDAP (i.e. the 'getnc' function) is underway. In the end - this will greatly ease the installation of the Matlab netcdf widgets and allow for the same code to operate using Matlab's native low-level interface or the historical third-party interface (mexnc, mexcdf, mexcdf53).

    You will need the 'normal' DART/matlab functions available to Matlab, so be sure your MATLABPATH is set such that you have access to get_copy_index as well as nc_varget and getnc and ...
    This generally means your MATLABPATH should look something like:

    addpath('replace_this_with_the_real_path_to/DART/matlab')
    addpath('replace_this_with_the_real_path_to/DART/diagnostics/matlab')
    addpath('some_other_netcdf_install_dir/mexnc','-BEGIN')
    addpath('some_other_netcdf_install_dir/snctools')
    addpath('some_other_netcdf_install_dir/netcdf_toolbox/netcdf')
    addpath('some_other_netcdf_install_dir/netcdf_toolbox/netcdf/nctype')
    addpath('some_other_netcdf_install_dir/netcdf_toolbox/netcdf/ncutility')
    addpath('some_other_CSIRO_install_dir/matlab_netCDF_OPeNDAP')
    
    which is precisely why I'm trying to shorten it. On my systems, I've bundled the last 6 commands into a function called ncstartup.m which is run every time I start Matlab (because I execute it in my ~/matlab/startup.m)




    Adding Matlab support for your own model

    Under Construction.
    ...


    Generating those first initial conditions and observations.

    Under Construction.
    You need this to get started ... (generate perfect_model ics), maybe from Matlab ...


    Adding your code/observations/wisdom to DART.

    Under Construction.
    models, observations, routines, widgets ...


    non-Matlab diagnostics

    Under Construction.
    models, observations, routines, widgets ...


    Optional Software

    Under Construction.

    1. ncview -- a visual browser for netCDF files great for looking at innovations, etc.
    2. NCO
    3. ncl cam spaghetti plots
    4. some MPI installation
    5. maybe even a queueing system


    DART_offline_examples.html

    DART examples

    1. models lorenz_63 examples
    2. tutorial stuff, perhaps
    3. diagnostic plot gallery
    4. L63 diverging
    5. L63 performing well
    vertical profile of rmse and bias for tropics
    CAM ... sep 2006, base case ...



    Suggestions for the DART facility ...


    There are a large number of software enhancements, simplifications, and supporting widgets that need to be made -- the length of the 'to_do' list is a constant source of simultaneous amusement and dismay for Tim and Nancy. If you would like to share an idea on how to improve DART, we're all ears. Long requests should be sent to the dart@ucar.edu email address. Short ones can be entered here:



    If you provide an email address, we may contact you to either ask for more information or let you know that "it's done". your e-mail address