DAReS header

    Requirements to install and run DART


    DART is intended to be highly portable among Unix/Linux operating systems. DART has been run successfully on Windows machines under the cygwin environment. Those instructions are under development - if you would like to be a friendly beta-tester please send me (Tim Hoar) an email and I'll send you the instructions, as long as you promise to provide feedback (good or bad!) so I can improve them. My email is   thoar @ ucar . edu - minus the spaces, naturally.

    Minimally, you will need:

    1. a Fortran90 compiler,
    2. the netCDF libraries built with the F90 interface,
    3. perl (just about any version),
    4. an environment that understands csh or tcsh, and
    5. the old unix standby ... make

    The DART diagnostic scripts are written for Matlab®. Matlab® R2008B (finally!) has native netCDF support, If you want to use the DART diagnostic scripts, you will need Matlab® along with the mexnc and snctools toolboxes (appropriate for your version of Matlab®).



    Requirements: a Fortran90 compiler


    The DART software is predominantly built on several Linux/x86 platforms with several versions of the Intel Fortran Compiler for Linux, which (at one point) is/was free for individual scientific use. It has also been built and successfully run with several versions of each of the following: IBM XL Fortran Compiler, Portland Group Fortran Compiler, Lahey Fortran Compiler, Pathscale Fortran Compiler, GNU Fortran 95 Compiler ("gfortran"), Absoft Fortran 90/95 Compiler (Mac OSX). Since recompiling the code is a necessity to experiment with different models, there are no binaries to distribute.


    Requirements: the netCDF library


    DART uses the netCDF self-describing data format for the results of assimilation experiments. These files have the extension .nc and can be read by a number of standard data analysis tools. In particular, DART also makes use of the F90 interface to the library which is available through the netcdf.mod and typesizes.mod modules. IMPORTANT: different compilers create these modules with different "case" filenames, and sometimes they are not both installed into the expected directory. It is required that both modules be present. The normal place would be in the netcdf/include directory, as opposed to the netcdf/lib directory.

    If the netCDF library does not exist on your system, you must build it (as well as the F90 interface modules). The library and instructions for building the library or installing from an RPM may be found at the netCDF home page: http://www.unidata.ucar.edu/packages/netcdf/

    NOTE: The location of the netCDF library, libnetcdf.a, and the locations of both netcdf.mod and typesizes.mod will be needed later.



    [top]


    Installing DART : Download the distribution.


    The DART source code is distributed through a Subversion server. Subversion (the client-side app is 'svn') allows you to compare your code tree with one on a remote server and selectively update individual files or groups of files - without losing any local modifications. I have a brief summary of the svn commands I use most posted at: http://www.image.ucar.edu/~thoar/svn_primer.html

    The DART download site is: http://www.image.ucar.edu/DAReS/DART/DART_download.

    svn has adopted the strategy that "disk is cheap". In addition to downloading the code, it downloads an additional copy of the code to store locally (in hidden .svn directories) as well as some administration files. This allows svn to perform some commands even when the repository is not available. It does double the size of the code tree ... so the download is something like 480MB -- pretty big. BUT - all future updates are (usually) just the differences, so they happen very quickly.

    If you follow the instructions on the download site, you should wind up with a directory named my_path_to/DART, which we call $DARTHOME. Compiling the code in this tree (as is usually the case) will necessitate much more space.

    If you cannot use svn, just let me know and I will create a tar file for you. svn is so superior that a tar file should be considered a last resort.


    Installing DART : document conventions


    All filenames look like this -- (typewriter font, green).
    Program names look like this -- (italicized font, green).
    user input looks like this -- (bold, magenta).

    commands to be typed at the command line are contained in an indented gray box.

    And the contents of a file are enclosed in a box with a border:
    &hypothetical_nml
      obs_seq_in_file_name = "obs_seq.in",
      obs_seq_out_file_name = "obs_seq.out",
      init_time_days = 0,
      init_time_seconds = 0,
      output_interval = 1
    &end


    Installing DART


    The entire installation process is summarized in the following steps:

    1. Determine which F90 compiler is available.
    2. Determine the location of (or build) the netCDF library.
    3. Download the DART software into the expected source tree.
    4. Modify certain DART files to reflect the available F90 compiler and location of the appropriate libraries.
    5. Build the executables.

    If you can compile and run ONE of the low-order models, you should be able to compile and run ANY of the low-order models. For this reason, we can focus on the Lorenz `63 model. Consequently, the only directories with files to be modified to check the installation are usually: DART/mkmf and DART/models/lorenz_63/work.

    We have tried to make the code as portable as possible, but we do not have access to all compilers on all platforms, so there are no guarantees. We are interested in your experience building the system, so please send us a note at dart @ ucar .edu


    Customizing the build scripts -- Overview.


    DART executable programs are constructed using two tools: mkmf, and make.

    mkmf requires two separate input files. The first is a `template' file which specifies details of the commands required for a specific Fortran90 compiler and may also contain pointers to directories containing pre-compiled utilities required by the DART system. This template file will need to be modified to reflect your system. The second input file is a `path_names' file which are supplied by DART and can be used without modification. An mkmf command is executed which uses the 'path_names' file and the mkmf template file to produce a Makefile which is subsequently used by the standard make utility.


    Building and Customizing the 'mkmf.template' file


    A series of templates for different compilers/architectures exists in the DART/mkmf/ directory and have names with extensions that identify the compiler, the architecture, or both. This is how you inform the build process of the specifics of your system. Our intent is that you copy one that is similar to your system into DART/mkmf/mkmf.template and customize it.
    For the discussion that follows, knowledge of the contents of one of these templates (i.e. DART/mkmf/mkmf.template.intel.linux) is needed. Note that only the LAST lines are shown here, the head of the file is just a big comment (worth reading, btw).


    ...
    MPIFC = mpif90
    MPILD = mpif90
    FC = ifort
    LD = ifort
    NETCDF = /usr/local
    INCS = -I${NETCDF}/include
    LIBS = -L${NETCDF}/lib -lnetcdf
    FFLAGS = -O2 $(INCS)
    LDFLAGS = $(FFLAGS) $(LIBS)

    variablevalue
    FC the Fortran compiler
    LD the name of the loader; typically, the same as the Fortran compiler
    NETCDF the location of your netCDF installation containing netcdf.mod and typesizes.mod. Note that the value of the NETCDF variable will be used by the FFLAGS, LIBS, and LDFLAGS variables.

    Building the Lorenz_63 DART project.


    Currently, DART executables are built in a work subdirectory under the directory containing code for the given model. There are eight mkmf_xxxxxx files for the following programs:

    programpurpose
    preprocess creates custom source code for just the observations of interest
    create_obs_sequence specify a (set) of observation characteristics taken by a particular (set of) instruments
    create_fixed_network_seq    specify the temporal attributes of the observation sets
    perfect_model_obs spinup, generate "true state" for synthetic observation experiments, ...
    filter perform experiments
    obs_diag creates observation-space diagnostic files to be explored by the Matlab® scripts.
    obs_sequence_tool manipulates observation sequence files. It is not generally needed (particularly for low-order models) but can be used to combine observation sequences or convert from ASCII to binary or vice-versa. Since this is a specialty routine - we will not cover its use in this document.
    wakeup_filter is only needed for MPI applications. We're starting at the beginning here, so we're going to ignore this one, too.

    quickbuild.csh is a script that will build every executable in the directory. There is an optional argument that will additionally build the mpi-enabled versions - which is not the intent of this set of instructions. Running quickbuild.csh will compile all the executables.



    cd DART/models/lorenz_63/work
    ./quickbuild.csh -nompi

    The result (hopefully) is that eight executables now reside in your work directory. The most common problem is that the netCDF libraries and include files (particularly typesizes.mod) are not found. Find them, edit the DART/mkmf/mkmf.template to point to their location, recreate the Makefile, and try again. The next most common problem is from the gfortran compiler complaining about "undefined reference to `system_'" which is covered in the Platform-specific notes section.


    Checking the build -- running something.


    This section is not intended to provide any details of why we are doing what we are doing - this is sort of a 'black-box' test. The DART/models/lorenz_63/work directory is distributed with input files ready to run a simple experiment: use 80 ensemble members to assimilate observations 'every 6 hours' for 50 days. Simply run the programs perfect_model_obs and filter to generate the results to compare against known results.

    The initial conditions files and observations sequences are in ASCII, so there is no portability issue, but there may be some roundoff error in the conversion from ASCII to machine binary. With such a highly nonlinear model, small differences in the initial conditions will result in a different model trajectory. Your results should start out looking VERY SIMILAR and may diverge with time.


    ./perfect_model_obs
    ./filter

    There should now be the following output files:

    from executable "perfect_model_obs"
    True_State.nc a netCDF file containing the model trajectory ... the 'truth'
    obs_seq.out The observations (harvested as the true model was advanced) that were assimilated.
    perfect_restart the final state of the model - in ASCII. The (true) model state at the 'end' of the experiment.
     
    from executable "filter"
    Prior_Diag.nc A netCDF file of the ensemble model states just before assimilation.
    Posterior_Diag.nc A netCDF file of the ensemble model states just after assimilation.
    obs_seq.final The observations that were assimilated as well as the ensemble mean estimates of the 'observations' - for comparison.
    filter_restart The model states of the ensemble members at the 'end' of the experiment.
     
    from both
    dart_log.out The run-time log of the experiment. (this grows with each execution)
    dart_log.nml A record of the input settings of the experiment.

    Note that Prior_Diag.nc and Posterior_Diag.nc contains values of the ensemble mean, ensemble spread, the individual ensemble members, the inflation mean and standard deviation. The simplest way to check the results is with the Matlab® scripts distributed with DART.

    The DART/tutorial documents are an excellent way to kick the tires on DART and learn about ensemble data assimilation. If you've been able to build the Lorenz 63 model, you have correctly configured your mkmf.template and you can run anything in the tutorial.



    [top]


    Configuring Matlab® to read netCDF files.


    Find your version of Matlab® (type 'ver' at the Matlab® prompt) and visit http://mexcdf.sourceforge.net/downloads to get the right combination of mexnc and snctools for your version of Matlab®. Follow their installation instructions. You can test if the install went well by trying to read a variable from any netCDF file (it doesn't have to be one created by DART -- see 'help nc_varget', for example).

    Be sure your MATLABPATH is set such that you have access to nc_varget. This generally means you will have to do something like the following at the Matlab® prompt :


    addpath('wherever_you_installed_mexcdf/snctools')
    addpath('wherever_you_installed_mexcdf/mexnc','-BEGIN')
    

    It's very convenient to put these it in your ~/matlab/startup.m so they get run every time Matlab® starts up.



    [top]


    Are the results correct? (requires Matlab® with netCDF support)


    The initial conditions files and observations sequences are in ASCII, so there is no portability issue, but there may be some roundoff error in the conversion from ASCII to machine binary. With such a highly nonlinear model, small differences in the initial conditions will result in a different model trajectory. Your results should start out looking VERY SIMILAR and may diverge with time.

    The simplest way to determine if the installation is successful is to run some of the functions we have available in DART/matlab/. Usually, we launch Matlab from the DART/models/lorenz_63/work directory and use the Matlab addpath command to make the DART/matlab/ functions available. In this case, we know the true state of the model that is consistent with the observations. The following Matlab scripts compare the ensemble members with the truth and can calculate an error.


    cd DART/models/lorenz_63/work
    matlab
    ... (lots of startup messages I'm skipping)...
    >> addpath ../../../matlab
    >> plot_total_err
    
    Input name of True State file;  for True_State.nc
    True_State.nc
    Input name of prior or posterior diagnostics file;
     for Prior_Diag.nc
    Prior_Diag.nc
    Comparing True_State.nc and
              Prior_Diag.nc
    
    pinfo = 
    
                     model: 'Lorenz_63'
                   def_var: 'state'
            num_state_vars: 3
           num_ens_members: 22
        time_series_length: 200
             min_state_var: 1
             max_state_var: 3
               min_ens_mem: 1
               max_ens_mem: 22
            def_state_vars: [1 2 3]
                truth_file: 'True_State.nc'
                diagn_file: 'Prior_Diag.nc'
                truth_time: [1 200]
                diagn_time: [1 200]
    
    true state is copy   1
    ensemble mean is copy   1
    ensemble spread is copy   2
    >> plot_ens_time_series
    


    From the plot_ens_time_series graphic, you can see the individual green ensemble members getting more constrained as time evolves. If your figures look similar to these, that's pretty much what you're looking for and you should feel pretty confident that everything is working.



    [top]


    Running the examples used in the DART workshops.


    [top]


    Use DART to run a 'perfect model' experiment.


    Once a model is compatible with the DART facility, all of the functionality of DART is available. This includes 'perfect model' experiments (also called Observing System Simulation Experiments - OSSEs). Essentially, the model is run forward from some state and, at predefined times, the observation forward operator is applied to the model state to harvest synthetic observations. This model trajectory is known as the 'true state'. The synthetic observations are then used in an assimilation experiment. The assimilation performance can then be evaluated precisely because the true state (of the model) is known.

    The basic steps to running an OSSE from within DART are:

    1. Run create_obs_sequence to generate the type of observation (and observation error) desired.
    2. Run create_fixed_network_seq to define the temporal distribution of the desired observations.
    3. Run perfect_model_obs to advance the model from a known initial condition - and harvest the 'observations' (with error) from the (known) true state of the model.
    4. Run filter to assimilate the 'observations'. Since the true model state is known, it is possible to evaluate the performance of the assimilation.

    An OSSE is explored in our Lorenz '96 example.

    More information about creating observation sequence files for OSSE's is available in the observation sequence discussion section.

    There are a set of Matlab® functions to help explore the assimilation performance in state-space. The state-space functions are in the DART/matlab directory. Once you fire up Matlab® and have the netCDF support sorted out, you will essentially follow the same procedure as that outlined in the "Are the results correct?" section. The most common functions are listed below. They each have a help document available by issuing the help plot_bins command at the Matlab® prompt (for example).


    plot_bins.m plots the rank histograms for a set of state variables.
    plot_total_err.m plots the evolution of the error (un-normalized) and ensemble spread of all state variables.
    plot_ens_mean_time_series.m    plots the evolution of a set of state variables - just the ensemble mean (and Truth, if available).

    [top]


    The Data Assimilation Research Testbed - DART Tutorial


    DART comes with an extensive set of tutorial materials, working models of several different levels of complexity, and data to be assimilated. It has been used in several multi-day workshops and can be used as the basis to teach a section on Data Assimilation. Download the DART software distribution and look in the tutorial subdirectory for the pdf and framemaker source for each of the 22 tutorial sections. The most recent versions of the tutorial are always provided below.

    Browsing the tutorial is worth the effort.
    Taking the tutorial is FAR better!


    1. Section 1 [pdf] Filtering For a One Variable System.
    2. Section 2 [pdf] The DART Directory Tree.
    3. Section 3 [pdf] DART Runtime Control and Documentation.
    4. Section 4 [pdf] How should observations of a state variable impact an unobserved state variable? Multivariate assimilation.
    5. Section 5 [pdf] Comprehensive Filtering Theory: Non-Identity Observations and the Joint Phase Space.
    6. Section 6 [pdf] Other Updates for An Observed Variable.
    7. Section 7 [pdf] Some Additional Low-Order Models.
    8. Section 8 [pdf] Dealing with Sampling Error.
    9. Section 9 [pdf] More on Dealing with Error; Inflation.
    10. Section 10 [pdf] Regression and Non-linear Effects.
    11. Section 11 [pdf] Creating DART Executables.
    12. Section 12 [pdf] Adaptive Inflation.
    13. Section 13 [pdf] Hierarchical Group Filters and Localization.
    14. Section 14 [pdf] DART Observation Quality Control.
    15. Section 15 [pdf] DART Experiments: Control and Design.
    16. Section 16 [pdf] Diagnostic Output.
    17. Section 17 [pdf] Creating Observation Sequences.
    18. Section 18 [pdf] Lost in Phase Space: The Challenge of Not Knowing the Truth.
    19. Section 19 [pdf] DART-Compliant Models and Making Models Compliant.
    20. Section 20 [pdf] Model Parameter Estimation.
    21. Section 21 [pdf] Observation Types and Observing System Design.
    22. Section 22 [pdf] Parallel Algorithm Implementation.
    23. Carbon Tutorial [pdf] A Simple 1D Advection Model.


    [top]


    Suggestions for the DART facility ...


    There are a large number of software enhancements, simplifications, and supporting widgets that need to be made -- the length of the 'to_do' list is a constant source of simultaneous amusement and dismay for Tim and Nancy. If you would like to share an idea on how to improve DART, we're all ears. Long requests should be sent to the dart@ucar.edu email address. Short ones can be entered here:



    If you provide an email address, we may contact you to either ask for more information or let you know that "it's done". your e-mail address


    [top]