The following free open-source tools have proven to be very useful:
The following licensed (commercial) tools have proven to be very useful:
We work to keep the DART code highly portable. We avoid compiler-specific constructs, require no system-specific functions, and try as much as possible to be easy to build on new platforms.
DART has been compiled and run on Apple laptops and workstations, Linux clusters small and large, SGI Altix systems, IBM Power systems, IBM Intel systems, Cray systems.
DART has been compiled with compilers from Intel, PGI, Cray, GNU, IBM, Pathscale.
MPI versions of DART have run under batch systems including LSF, PBS, Moab/Torque, and Sun Grid Engine.
Most of the platform-specific notes are in the appropriate mkmf.template.xxxx.yyyy file. There are very few situations that require making additional changes.
For some reason, the gfortran compiler does not require an interface to the system() routine while all the other compilers we have tested do need the interface. This makes it impossible to have a module that is compiler-independent. The interface is needed in the null_mpi_utilities_mod.f90, and/or mpi_utilities_mod.f90. The problem surfaces at link time :
null_mpi_utilities_mod.o(.text+0x160): In function `__mpi_utilities_mod__shell_execute': : undefined reference to `system_' null_mpi_utilities_mod.o(.text+0x7c8): In function `__mpi_utilities_mod__destroy_pipe': : undefined reference to `system_' null_mpi_utilities_mod.o(.text+0xbb9): In function `__mpi_utilities_mod__make_pipe': : undefined reference to `system_' collect2: ld returned 1 exit status make: *** [preprocess] Error 1
There is a script to facilitate making the appropriate change to null_mpi_utilities_mod.f90 and mpi_utilities_mod.f90. Run the shell script DART/mpi_utilities/fixsystem with no arguments to simply 'flip' the state of these files (i.e. if the system block is defined, it will undefine the block by commenting it out; if the block is commented out, it will define it by uncommenting the block). If you want to hand-edit null_mpi_utilities_mod.f90 and mpi_utilities_mod.f90 - look for the comment block that starts ! BUILD TIP and follow the directions in the comment block.
Compilers create modules in their own particular manner ... a module built by one compiler may not (will usually not) be useable by another compiler. Sometimes it happens that the Fortran90 modules for the netCDF interface compiled by compiler A is trying to be used by compiler B. This generally results in an error message like:
Fatal Error: File 'netcdf.mod' opened at (1) is not a <pick_your_compiler> module file make: *** [utilities_mod.o] Error 1
The only solution here is to make sure the mkmf.template file is referencing the appropriate netCDF installation.
The endian-ness of the binary files is
specific to the chipset, not the compiler or the code (normally).
There are some models that require a specific
endian binary file. Most compilers have
some sort of ability to read and/or write binary files of a
specific (or non-native) endianness by throwing some compile
flags. It is generally an 'all-or-nothing' approach in that trying
to micromanage which files are opened with native endianness and
which files are openened with the non-native endianness is generally
too time-consuming and fraught with error to be of much use.
If the compile flags exist and are known to us, we try to include
them in the comment section of the individual
mkmf.template.xxxx.yyyy file.
These errors most often manifest themselves as 'time' errors
in the DART execution. The restart/initial conditions files have
the valid time of the ensuing model state as the first bit of information
in the header, and if these files are 'wrong'-endian, the encoded times
are nonsensical.
If you want to use MPI and are interested in testing something simple before total immersion: try running the MPI test routines in the DART/doc/mpi directory. This directory contains some small test programs which use both MPI and the netCDF libraries. It may be simpler to debug any build problems here, and if you need to submit a problem report to your system admin people these single executables are much simpler than the entire DART build tree.
There are two main techniques for doing data assimilation: variational and ensemble methods. DART uses a variety of ensemble Kalman filter techniques.
We distribute the full source code for the system so you're free to edit
anything you please. However, the system was designed so that you should
be able to add code in a few specific places to add a new model, work
with new observation types, or change the assimilation algorithm.
To add a new model you should be able to add a new
DART/models/XXX/model_mod.f90 file to interface
between your model and DART. We expect that you should not
have to alter any code in your model to make it work with DART.
To add new observation types you should be able to add a
new DART/obs_def/obs_def_XXX_mod.f90 file. If there is not
already a converter for this observation type you can add a
converter in DART/observations/XXX.
If you are doing data assimilation algorithm research
you may be altering some of the core DART routines in the
DART/assim_tools/assim_tools_mod.f90 or
DART/filter/filter.f90 files.
Please feel free to email DART support (dart at ucar.edu) for
help with how to do these modifications so they work with
the parallel version of DART correctly.
If you add support for a new observation type, a new model,
or filter kind, we'd love for you to send a copy of it back to us
for inclusion in the DART distribution.
We run on almost any Linux-based system including laptops, clusters,
and supercomputers. This includes IBMs, Crays, SGIs, Macs. We discourage
trying to use Windows but it has been done using the CygWin package.
We require a Fortran 90 compiler. Common ones in use are from GNU
(gfortran), Intel, PGI, PathScale, IBM, and g95.
We need a compatible NetCDF library, which means compiled with the same
compiler you build DART with, and built with the Fortran interfaces.
You can run DART as a single program without any additional software.
To run in parallel on a cluster or other multicore platform
you will need a working MPI library and runtime system.
If one doesn't come with your system already OpenMPI is a
good open-source option.
Our diagnostic routines are MATLAB® scripts, which is a
commercial math/visualization package. Some users
use IDL, NCL, or R but they have to adapt our scripts themselves.
Go to the extensive
DART web pages where there
are detailed instructions on checking the source out of our subversion
server, compiling, running the tutorials, and examples of other users'
applications of DART.
If you really hate reading instructions you can try looking at
the README in the top level directory. But if you run into problems
please read the
full setup instructions before contacting us for help.
We will start out suggesting you read those web pages first anyway.
The MPI compiler commands are usually scripts or programs which add
additional arguments and then call the standard Fortran compiler.
If there is more than one type of compiler on a system you must find
the version of MPI which was compiled to wrap around the compiler you
are using.
In the DART/developer_tests/mpi_utilities/tests directory are some small programs
which can be used to test compiling and running with MPI.
If you are using version 1.10.0 of OpenMPI and getting
compiler errors about being unable to find a matching
routine for calls to MPI_Get() and/or MPI_Reduce(),
please update to version 1.10.1 or later.
There were missing interfaces in the
1.10.0 release which are fixed in the 1.10.1 release.
If you are behind some kind of firewall they may not allow the ports needed to talk to the subversion server. If you can, try from a machine outside the firewall, or talk to your system support people about how to access a subversion server. Sometimes there are machines which allow subversion access which share filesystems with machines that are not allowed. It's always better if you can go back later and update your copy of DART with subversion to keep it in sync with the server; making a tar of the checked out source and moving it to another machine won't let you do this.
Any application that uses the NetCDF data libraries must be compiled with exactly the same compiler as the libraries were built with. On systems which have either multiple compilers, or multiple versions of the same compiler, there is the possibilty that the libraries don't match the compiler you're using to compile DART. Options here are:
If you believe you are using the right version of the compiler, then check to see if the Fortran interfaces have been compiled into a single library with the C code, or if there are two libraries, libnetcdf.a and libnetcdff.a (note the 2 f's in the second library). The library lines in your mkmf.template must reference either one or both libraries, depending on what exists. This is a choice that is made by the person who built the NetCDF libraries and cannot be predicted beforehand.
If you're running the Lanai release or code from the trunk
later than 2013, the DART Makefiles should automatically call a script
in the DART/mpi_utilities directory named fixsystem.
This script tries to alter the MPI source code in that directory
to work with your compiler. If you still get a compiler error
look at this script and see if you have to add a case for the name
of your compiler.
If you're running the Kodiak release or earlier, you have to
run fixsystem yourself before compiling. We distributed
the code so it would work without change for the gfortran
compiler, but all other compilers require that you run
fixsystem before trying to compile.
The netCDF libraries need to be built by the same version of the same compiler as you are building DART. If your system has more than one compiler on it (e.g. intel ifort and gfortran) or multiple versions of the same compiler (e.g. gfortran 4.1 and 4.5) you must have a version of the netCDF libraries which was built with the same version of the same compiler as you're using to build DART.
There are several important options when the netCDF libraries are built that change what libraries you get and whether you have what you need. The problems we run into most frequently are:
Bottom line: What you need to set for the library list in your DART/mkmf/mkmf.template file depends on how your netCDF was built.
We recommend that you run an assimilation with Fortran double precision reals (e.g. all real values are real*8 or 64 bits). However if your model is compiled in single precision (real*4 or 32 bits) then there is an option to build DART the same way. Edit DART/common/types_mod.f90 and change the definition of R8 to equal R4 (comment out the existing line and comment in the following line). Rebuild all DART executables and it will run with single precision reals. We declare every real variable inside DART with an explicit size, so we do not recommend using compiler flags to try to change the default real variable precision because it will not affect the DART code.
Look in the log or in the standard output for the message: 'initialize_mpi_utilities: Running with N MPI processes.' Instead of this message, if you see: initialize_mpi_utilities: Running single process then you have NOT successfully compiled with MPI; you are running N duplicate copies of a single-task program. Rerun the quickbuild.csh script with the -mpi flag to force it to build filter with mpif90 or whatever the mpi compiler wrapper is called on your system.
If you are running one of the "low-order" models (e.g. one of the Lorenz models,
the null model, the pe2lyr model, etc), the easiest way to run is to let DART
control advancing the model when necessary. You run the "filter" executable
and it runs both the assimilation and model advances until all observations
in the input observation sequence file have been assimilated.
See the "async" setting in the
filter namelist documentation for more information.
If you are running a large model with a complicated configuration and/or
run script, you will probably want to run the assimilation separately from
the model advances. To do this, you will need to script the execution,
and break up the observations into single timestep chunks per file.
The scripting will need to create filter input files from the model files,
link the current observation file to the input filename in the namelist,
copy or rename any inflation files from the previous assimilation step,
run filter, convert the filter output to model input files, and then run
the model. There are example scripts which do this in the WRF shell_scripts
directory, also the MPAS shell_scripts directory. These scripts are both
highly model-dependent as well as computing system dependent.
If you are running any of the CESM models (e.g. CAM, POP, CLM) then the
scripts to set up a CESM case with assimilation are provided in the
DART distribution. At run time, the run script provided by CESM is used.
After the model advance a DART script is called to do the assimilation.
The "multi-instance" capability of CESM is used to manage the multiple
copies of the components which are needed for assimilation, and to run
them all as part of a single job.
One of the assumptions of the Kalman filter is that the model
states and the observation values have gaussian distributions.
The assimilation can work successfully even if this is not
actually true but there are certain cases where this leads
to problems.
If any of the model state values must remain bounded, for example
values which must remain positive, or must remain between 0 and 1,
you may have to add some additional code to ensure the posterior
values obey these constraints. It is not an indication of an error
if after the assimilation some values are outside the required range.
Most users deal with this,
successfully, by letting the assimilation update the values
as it will, and then during the step where the model data
is converted from DART format to the model native format,
any out-of-range values are changed at that time. For example,
the WRF model has a namelist item in the &model_nml namelist
which can be set at run-time to list which variables have
minimum and/or maximum values and
the conversion code will enforce the given limits.
Generally this works successfully, but if the observations or
the model are biased and the assimilation is continuously
trying to move model state out of range, the distribution can
become seriously unbalanced. In this case another solution,
which requires more coding, might be to convert the values to
a log scale on import to DART, do the assimilation with the log
of the observation values, and then convert back to the original
scale at export time. This ensures the values stay positive,
which is common requirement for legal values.
This is a common problem, especially when adding a new observation type or trying to assimilate with a new model. But it can happen at any time and can be confusing about why nothing is changing. See this web page for a list of common causes of the assimilation output state being the same as the input state, and how to determine which one is responsible.
Each module in DART has an html web page which describes the namelists in detail. Start with DART/index.html and follow the links to all the other modules and namelists in the system. If you want help with setting up an experiment the DART/filter/filter.html page has some introductory advice for some of the more important namelist settings.
If your job is getting killed for no discernable reason but is
usually during computing prior or posterior forward operators,
or during writing the diagnostics file, the problem may be caused
by the MPI timeout limit. This usually happens only
when the number of MPI tasks is much larger than the number
of ensemble members, and there are very slow forward operator
computations or very large states to write into the diagnostics
files. In the standard DART distribution only the first
N tasks (where N is the number of ensemble members) are doing work
during the forward operators, or only 1 task for writing diagnostic files.
All the other tasks will be waiting at an MPI barrier. If they wait there
long enough they reach the timeout threshold which assumes that
at least one or more other tasks have failed and so they exit.
The solutions are either to set an environment variable that lengthens
the timeout threshold, run with fewer MPI tasks, or ask the
DART team to be a Beta user of a newer version of
DART which does not have such large time differentials between
different MPI tasks.
If filter finishes running, including the final timestamp message to the log file, but then the MPI job does not exit (the next line in the job script is not reached), and you have set the MPI timeout to be large to avoid the job being killed by MPI timeouts, then you have run into a bug we also have seen. We believe this to be an MPI library bug which only happens under a specific set of circumstances. We can reproduce it but cannot find a solution. The apparent bug happens more frequently with larger processor counts (usually larger than about 4000 MPI tasks), so if you run into this situation try running with a smaller MPI task count if possible, and not setting the MPI debug flags. We have seen this happen on the NCAR supercomputer Yellowstone with both the MPICH2 and PEMPI MPI libraries.
Most users with large WRF domains run a single cycle of filter to
do assimilation, and then advance each ensemble member of WRF from
a script, possibly submitting them in a batch to the job queues.
For smaller WRF runs, if WRF can be compiled without MPI (the 'serial'
configuration) then filter can cycle inside the same program, advancing
multiple ensemble members in parallel. See the WRF documentation pages
for more details.
If you are using the advance_model.csh script that
is distributed with DART, it will take care of converting the filter
output files back to the WRF input files for the next model advance.
If you are setting up a free run or doing something different than
what the basic script supports, read on to see what must be done.
When you finish running DART it will have created a set of
sssss.#### restart files,
where the sssss part of the filename comes from the setting
of &filter_nml :: restart_out_file_name
(and is frequently filter_restart).
The .#### is a 4 digit number appended
by filter based on the ensemble number.
These files contain the WRF state vector data that was used in the
assimilation, which is usually a subset of all the fields in a
wrfinput_d01 file.
dart_to_wrf is the standard utility to insert the DART
state information into a WRF input file, e.g. wrfinput_d01.
For multiple WRF domains, a single run of the converter program
will update the _d02, _d03,
..., files at the same time as the _d01 file.
In the input.nml file, set the following:
&dart_to_wrf_nml model_advance_file = .false. dart_restart_name = 'filter_restart.####', /
where '####' is the ensemble member number. There is no option to alter the
input/output WRF filename.
Run dart_to_wrf.
Remember to preserve each wrfinput_d01 file or you will
simply keep overwriting the information in the same output file.
Repeat for each ensemble member and you will be ready to run WRF to make
ensemble forecasts.
If filter is advancing the WRF model, and
you want to spawn forecasts from intermediate assimilation steps:
Use the assim_model_state_ic.#### files instead
of the filter_restart.#### files, and set the
model_advance_file namelist item
to be .true. .
The CESM climate model comes with its own configuration, build, run, and archive scripts already. The DART distribution supplies a 'setup' script that calls the CESM scripts to build a new case, and then add a section to the CESM run script so the DART code will be run after each CESM model advance. The DART setup scripts are needed only when building a new case. At run-time the CESM run scripts are used to start the job. The CESM "multi-instance" capability is used to run multiple ensemble members as part of a single job.
We use the CESM framework to execute the CESM model components, and then call the DART assimilation via an addition to the standard CESM run script. We provide a set of setup scripts in our DART/models/XXX/shell_scripts directories, where XXX is currently one of: cam, POP, clm, or CESM. Start with the shell script, set the options you want there, and then run the script. It calls the standard CESM 'build_case' scripts, and stages the files that will be needed for assimilation. See comments in the appropriate setup script for more details of how to proceed.
Certain versions of CESM (including CESM1_5_alpha02d) won't run with 3 instances (ensemble members). We are unsure what other instance sizes fail. The error message is about box rearranging from box_rearrange.F90. This is a problem in CESM and should be reported via their Bugzilla process. 4 instances works fine.
If you are trying to assimilate with POP and you get this error:
The most likely cause is that the POP-DART model interface code is trying to read the POP grid information and the default file is in the wrong kind of binary for this system (big-endian and not little-endian). At this point the easiest solution is to rebuild the DART executables with a flag to swap the bytes as it is reading binary files. For the Intel compiler, see the comments at the top of the mkmf file about adding '-convert bigendian' to the FFLAGS line.
If you have the rm (remove) command aliased to require you to
confirm removing files, the CESM build process will stop and wait for
you to confirm removing the files. You should reply yes when prompted.
If you have questions about the DART setup scripts and how they interact
with CESM it is a good idea to set up a standalone CESM case without
any DART scripts or commands to be sure you have a good CESM environment
before trying to add DART. The DART setup script uses CESM scripts
and commands and cannot change how those scripts behave in your environment.
DART only uses the plain NetCDF libraries for I/O. CESM can be configured to use several versions of NetCDF including PIO, parallel netCDF, and plain netCDF. Be sure you have the correct modules loaded before you build CESM. If there are questions, try setting up a CESM case without DART and resolve any build errors or warnings there before using the DART scripts.