The entire installation process is summarized in the following steps:
If you can compile and run ONE of the low-order models,
you should be able to compile and run ANY of the low-order models.
For this reason, we can focus on the Lorenz `63 model.
Consequently, the only directories with files to be modified to check
the installation are usually:
DART/build_templates and
DART/models/lorenz_63/work.
We have tried to make the code as portable as possible, but we
do not have access to all compilers on all platforms, so there are no
guarantees. We are interested in your experience building the system,
so please send us a note at dart @ ucar .edu
DART is intended to be highly portable among Unix/Linux operating systems.
At this point we have no plans to port DART to Windows machines.
Minimally, you will need:
History has shown that it is a very good idea to make sure your
run-time environment has the following:
Additionally, what has proven to be nice (but not required) is:
The DART software is written in standard Fortran 90, with no compiler-specific extensions. It has been compiled with and run with several versions of each of the following: GNU Fortran Compiler ("gfortran") (free), Intel Fortran Compiler for Linux and OS X, IBM XL Fortran Compiler, Portland Group Fortran Compiler, and Lahey Fortran Compiler. Since recompiling the code is a necessity to experiment with different models, there are no binaries to distribute.
DART uses the
netCDF
self-describing data format for the results of assimilation
experiments. These files have the extension .nc
and can be read by a number of standard data analysis tools.
In particular, DART also makes use of the F90 interface to the
library which is available through the netcdf.mod
and typesizes.mod modules.
IMPORTANT: different compilers create these
modules with different "case" filenames, and sometimes they are
not both installed into the expected directory.
It is required that both modules be present. The normal place
would be in the netcdf/include directory, as opposed
to the netcdf/lib directory.
If the netCDF library does not exist on your system, you must
build it (as well as the F90 interface modules). The library and
instructions for building the library or installing from an RPM
may be found at the netCDF home page:
https://www.unidata.ucar.edu/software/netcdf/
NOTE: The location of the netCDF library, libnetcdf.a,
and the locations of both netcdf.mod and
typesizes.mod will be needed later.
Depending on the version of netCDF and the build options selected,
the fortran interface routines may be in a separate library named
libnetcdff.a (note the 2 F's). In this case both
libraries are required to build executables.
If you want to run your own model,
all you need is an executable and some scripts to interface with DART -
we have templates and examples.
If your model can be called as a subroutine, life is good, and
the hardest part is usually a routine to parse the model state vector into
one whopping array - and back. Again - we have templates, examples, and a
document describing the required interfaces. That document exists in the
DART code - DART/models/model_mod.html - as does all the most
current documentation.
Almost every DART program/module has a matching piece of documentation.
Starting with the Jamaica release, there is an option to compile with the MPI
(Message Passing Interface) libraries in order to run the assimilation
step in parallel on hardware with multiple CPUs. Note that this is
optional; MPI is not required. If you do want to run in parallel,
then we also require a working MPI library and appropriate cluster
or SMP hardware.
See the MPI intro
for more information on running with the MPI option.
One of the beauties of ensemble data assimilation is that even if
(particularly if) your model is single-threaded, you can still run
efficiently on parallel machines by dealing out each ensemble member
(a unique instance of the model) to a separate processor. If your
model cannot run single-threaded, fear not, DART can do that too,
and simply runs each ensemble member one after another using all
the processors for each instance of the model.
The DART source code is distributed through a Subversion server.
Subversion (the client-side app is 'svn') allows
you to compare your code tree with one on a remote server and
selectively update individual files or groups of files - without losing
any local modifications. I have a brief summary of the svn commands
I use most posted at:
http://www.image.ucar.edu/~thoar/svn_primer.html
The DART download site is:
http://www.image.ucar.edu/DAReS/DART/DART_download.
svn has adopted the strategy that "disk is cheap". In addition to downloading
the code, it downloads an additional copy of the code to store locally (in
hidden .svn directories) as well as some administration files. This allows
svn to perform some commands even when the repository is not available.
It does double the size of the code tree ... so the download is just over
1 GB -- pretty big. BUT - all future updates are (usually) just the
differences, so they happen very quickly. We are working to remove the large
example datasets into a separate .tar file to reduce the overhead.
If you follow the instructions on the download site, you should wind up with
a directory named my_path_to/DART, which
we call $DARTHOME.
Compiling the code in this tree (as is usually the case) will
necessitate much more space.
If you cannot use svn, just let me know and I will create a tar file for you.
svn is so superior that a tar file should be considered a last resort.
All filenames look like this -- (typewriter font, green).
Program names look like this -- (italicized font, green).
user input looks like this -- (bold, magenta).
And the contents of a file are enclosed in a box with a border:
DART executable programs are constructed using two tools:
mkmf, and
make.
The make utility is a very common
piece of software that requires a user-defined input file that records
dependencies between different source files. make
then performs a hierarchy of actions when one or more of the
source files is modified. mkmf is a perl
script that generates a make input file
(named Makefile) and an example namelist
input.nml.program_default
with the default values. The Makefile is designed
specifically to work with object-oriented Fortran90 (and other languages)
for systems like DART.
mkmf (think "make makefile")
requires two separate input files.
The first is a `template' file which specifies details of the commands
required for a specific Fortran90 compiler and may also contain
pointers to directories containing pre-compiled utilities required by
the DART system. This template file will need to be modified to
reflect your system. The second input file is a `path_names'
file which are supplied by DART and can be used without modification.
An mkmf command is executed which uses the
'path_names' file and the mkmf template file to produce a
Makefile which is subsequently used by the
standard make utility.
Shell scripts that execute the mkmf command for all standard
DART executables are provided as part of the standard DART software.
For more information on the FMS
mkmf please see the
mkmf documentation.
Be aware that we have slightly modified mkmf such that it also
creates an example namelist file for each program. The example namelist is
called
input.nml.program_default,
so as not to clash with any
exising input.nml that may exist in that directory.
A series of templates for different compilers/architectures exists in the DART/build_templates directory and have names with extensions that identify the compiler, the architecture, or both. This is how you inform the build process of the specifics of your system. Our intent is that you copy one that is similar to your system into DART/build_templates/mkmf.template and customize it. For the discussion that follows, knowledge of the contents of one of these templates (i.e. DART/build_templates/mkmf.template.intel.linux) is needed. Note that only the LAST lines are shown here, the head of the file is just a big comment (worth reading).
variable | value |
---|---|
FC | the Fortran compiler |
LD | the name of the loader; typically, the same as the Fortran compiler |
NETCDF | the location of your netCDF installation containing netcdf.mod and typesizes.mod. Note that the value of the NETCDF variable will be used by the FFLAGS, LIBS, and LDFLAGS variables. |
Several path_names_* files are provided in the work directory for each specific model, in this case: DART/models/lorenz_63/work. Since each model comes with its own set of files, the path_names_* files need no customization.
All DART programs are compiled the same way. Each model directory has a directory called work that has the components to build the executables. This is an example of how to build two programs for the lorenz_63 model: preprocess and obs_diag. preprocess needs to be built and run to create the source codes that support observations - which are (potentially) different for every model. Once that has been done, any other DART program may be built in the same was as obs_diag in this example.
Currently, DART executables are built in a
work subdirectory under the directory containing
code for the given model.
The Lorenz_63 model has seven mkmf_xxxxxx
files (some models have many more) for the following programs:
Program | Purpose |
---|---|
preprocess | creates custom source code for just the observations of interest |
create_obs_sequence | specify a (set) of observation characteristics taken by a particular (set of) instruments |
create_fixed_network_seq | specify the temporal attributes of the observation sets |
perfect_model_obs | spinup, generate "true state" for synthetic observation experiments, ... |
filter | perform experiments |
obs_diag | creates observation-space diagnostic files to be explored by the MATLAB® scripts. |
obs_sequence_tool | manipulates observation sequence files. It is not generally needed (particularly for low-order models) but can be used to combine observation sequences or convert from ASCII to binary or vice-versa. Since this is a specialty routine - we will not cover its use in this document. |
quickbuild.csh is a script that will build every executable in the directory. There is an optional argument that will additionally build the mpi-enabled versions - which is not the intent of this set of instructions. Running quickbuild.csh will compile all the executables.
The result (hopefully) is that seven executables now reside in your work directory. The most common problem is that the netCDF libraries and include files (particularly typesizes.mod) are not found. Find them, edit the DART/build_templates/mkmf.template to point to their location, recreate the Makefile, and try again. The next most common problem is from the gfortran compiler complaining about "undefined reference to `system_'" which is covered in the Platform-specific notes section.
This section is not intended to provide any details of why
we are doing what we are doing - this is sort of a 'black-box' test.
The DART/models/lorenz_63/work directory
is distributed with input files ready to run a simple experiment:
use 20 ensemble members to assimilate observations 'every 6 hours'
for 50 days.
Simply run the programs perfect_model_obs and
filter to generate the results to compare
against known results.
The initial conditions files and observations sequences are in ASCII,
so there is no portability issue, but there may be some roundoff
error in the conversion from ASCII to machine binary. With such a
highly nonlinear model, small differences in the initial conditions
will result in a different model trajectory. Your results should start
out looking VERY SIMILAR and may diverge with time.
The Manhattan release uses netCDF files for the input file format.
Creating the netCDF files from their ASCII representation is a trivial
operation - simply running a command that comes with any netCDF
installation: ncgen. After these files are built
(this only needs to be done once), simply running
perfect_model_obs
and filter is easy:
There should now be the following output files:
from executable "perfect_model_obs" | |
---|---|
perfect_output.nc | a netCDF file containing the model trajectory ... the 'truth' |
obs_seq.out | The observations (harvested as the true model was advanced) that were assimilated. |
from executable "filter" | |
preassim.nc | A netCDF file of the ensemble model states just before assimilation. This is the prior. |
filter_output.nc | A netCDF file of the ensemble model states after assimilation. |
obs_seq.final | The observations that were assimilated as well as the ensemble mean estimates of the 'observations' - for comparison. |
from both | |
dart_log.out | The run-time log of the experiment. This grows with each execution and may safely be deleted at any time. |
dart_log.nml | A record of the input settings of the experiment. This file may safely be deleted at any time. |
Note that if you change the input.nml namelist values controlling inflation
and file output, several (perhaps many) more files are created.
The DART/documentation/tutorial
documents are an excellent way
to kick the tires on DART and learn about ensemble data assimilation.
If you've been able to build the Lorenz 63 model, you have correctly
configured your mkmf.template and you can run
anything in the tutorial.
The Manhattan release of DART uses native MATLAB® netCDF support and no longer requires any third-party toolboxes. Furthermore, no additional MATLAB toolboxes are required. To allow your environment to seamlessly use the DART MATLAB functions, your MATLABPATH must be set such that you have access to a couple of DART directories. Do something like the following at the MATLAB® prompt, using the real path to your DART installation:
>> addpath('path_to_dart/diagnostics/matlab','-BEGIN') >> addpath('path_to_dart/documentation/DART_LAB/matlab','-BEGIN')
It's very convenient to put these it in your ~/matlab/startup.m so they get run every time MATLAB® starts up. DART provides an example diagnostics/matlab/startup.m that you can use. It is internally documented.
The initial conditions files and observations sequences are in ASCII,
so there is no portability issue, but there may be some roundoff
error in the conversion from ASCII to machine binary. With such a
highly nonlinear model, small differences in the initial conditions
will result in a different model trajectory. Your results should start
out looking VERY SIMILAR and may diverge with time.
The simplest way to determine if the installation is successful
is to run some of the functions we have available in
DART/diagnostics/matlab/. Usually, we launch MATLAB®
from the DART/models/lorenz_63/work
directory and use the MATLAB® addpath command
to make the DART/matlab/ functions available.
In this case, we know the true state of the model that is consistent
with the observations. The following MATLAB® scripts compare the
ensemble members with the truth and can calculate an error.
From the plot_ens_time_series graphic, you can see the individual green ensemble members getting more constrained as time evolves. If your figures look similar to these, that's pretty much what you're looking for and you should feel pretty confident that everything is working.