DART project logo

Jump to DART Manhattan Documentation Main Index


WRF/DART materials for the Manhattan release.


Getting started running your own WRF/DART system

The materials above were developed with earlier versions of DART, so not all aspects are still relevant. Still, we suggest working through the tutorial materials (especially the Intro) before trying to build your own WRF/DART analysis system with the current tools. The following tutorial is designed to help get you going with a real data retrospective cycled analysis. A specific case set is provided so that you can check that things are working as expected. Because of the wide range of computing environments we have limited ability for this tutorial to be equally simple to run for all users. Further, setting up a wrf/dart system of your own will require additional effort.

Specifically, this tutorial was assembled to be compatible with ~ WRF V3.9.1 and the DART Manhattan release, to be run on NCAR's Cheyenne supercomputer. Contact us if you need help adapting this guidance to your system.

DISCLAIMER: We do not claim that this is a turnkey or blackbox system. Be mentally prepared to invest a reasonable amount of time on the learning curve. There are many outstanding research issues which have no easy answers here. This is not a one week/grad student/naive user system. Even after you get the code up and running, you have to be able to interpret the results, which requires developing specific skills. There are a lot of ways to alter how the system works -- localization, inflation, which variables and observations are assimilated, the assimilation window time, the model resolution, etc, etc. This is both good and bad - you have many ways of improving your results, but you have to take care on how you leave all the settings of these inputs. Getting a set of scripts that runs doesn't mean the system is running well, or producing useful results. Let the adventure begin!

If you are a visual learner, this diagram may be helpful to you in understanding the overall flow of a cycled data assimilation system.

Step 1: Setup

There are several dependencies for the executables and scripting components. On Cheyennne, users have reported success building WRF, WPS, WRFDA, and DART with the following the default module environment including intel compilers, MPT, and netcdf 4. In addition, you'll need to load the NCO and NCL modules to run the script set that accompanies the tutorial.

Compile all of the components needed. This will include:

Create a work directory (e.g., on /scratch2/user/WORKDIR) and place this very large tar file in it. CAUTION ~ 15 GB file - you might be better off using 'wget' to download the file directly to your local system, e.g.:

wget http://www.image.ucar.edu/wrfdart/tutorial/wrf_dart_tutorial_23May2018.tar.gz
tar -xvfz wrf_dart_tutorial_23May2018.tar.gz

Then, you should see the following directories: icbc, obs_diag, rundir, scripts, output, perts, and template. The directory names (case sensitive) are important, as the scripts rely on these local paths and file names.

  1. Copy the contents of $DART/models/wrf/shell_scripts to the WORKDIR/scripts directory.
  2. Change your path to the rundir.
  3. Copy the WRF executables and support files (except for namelist.input) into rundir/WRF_RUN/.
    1. build a serial version of WRF and copy real.exe to the WRF_RUN directory with the name convention of real.serial.exe.
    2. From your WRFDA build, also copy the da_wrfvar.exe executable to the rundir/WRF_RUN/ directory.
    3. copy WRFDA/var/run/be.dat.cv3 to rundir/WRF_RUN/be.dat
  4. Copy the needed DART executables into rundir/.
    1. The executables from DART that you need are: advance_time, filter, pert_wrf_bc (no helper page), obs_diag, obs_sequence_tool, obs_seq_to_netcdf, and wrf_dart_obs_preprocess (if making your own obs).
    2. We will use a few 'advanced' features here as well, so copy the sampling_error_correction_table.nc to your work/rundir from the $DART/assimilation_code/programs/gen_sampling_err_table/work/ directory if it is not already there.
You will need a list of the file names for input and restart files as well. The files with the list of filenames (input_list_d01.txt, output_list_d01.txt) should already be present, but if needed the following script should get the job done (run in rundir):


set num_ens = 50 
set input_file_name  = "input_list_d01.txt" 
set input_file_path  = "./advance_temp" 

set output_file_name = "output_list_d01.txt" 

set n = 1 

if ( -e $input_file_name )  rm $input_file_name 
if ( -e $output_file_name ) rm $output_file_name 

while ($n <= $num_ens) 

   set     ensstring = `printf %04d $n`
   set  in_file_name = ${input_file_path}${n}"/wrfinput_d01" 
   set out_file_name = "filter_restart_d01."$ensstring 

   echo $in_file_name  >> $input_file_name
   echo $out_file_name >> $output_file_name

   @ n++
So far, your rundir should contain (correct as needed):

executables: advance_time, filter, obs_diag, obs_seq_to_netcdf, obs_sequence_tool, pert_wrf_bc, wrf_dart_obs_preprocess
scripts: new_advance_model.csh, add_bank_perts.ncl
directories: WRFIN (empty), WRFOUT (empty), WRF_RUN (wrf executables and support files, except namelist.input)
support data and files: input_list_d01.txt, output_list_d01.txt, sampling_error_correction_table.nc
namelists: input.nml, namelist.input

For this tutorial, we are providing you with a specified WRF domain. To make your own, you would need to define your own wps namelist and use WPS to make your own geogrid files. See the WRF site for help with building and running those tools as needed. You would also need to get the appropriate grib files to generate initial and boundary condition files for the full period you plan to cycle. In this tutorial we have provided you with geogrid files, a small set of grib files, and a namelist to generate series of analyses for several days covering a North American region.

Let's now look inside the scripts directory. You should find:

The primary script for running the cycled analysis system is the driver.csh. The param.csh is home to most of the key settings and paths for running the system. The scripts assim_advance.csh, assimilate.csh, first_advance.csh, prep_ic.csh, and diagnostics_obs.csh are templates for submitted jobs or helper scripts for specific tasks that need to be done during the cycling. Finally, the gen_retro_icbc.csh is used to generate template files and boundary conditions, while init_ensemble_var.csh is used to get an initial ensemble set ready for cycled DA. You will need to edit these scripts to provide the paths to where you are running the experiment and to connect up files. Look for 'set this appropriately #%%%#' for locations that you need to edit.

Next, move to the perts directory. Inside, unzip and extract the perturbation files (there should be 100 files). For you own case, you will need to create a perturbation bank of your own. A brief description for running the the script from $DART/models/wrf/shell_scripts/gen_pert_bank.csh is provided with that script.

The icbc directory contains a geo_em_d01.nc file (geo information for our test domain), and grib files that will be used to generate the initial and boundary condition files.

The template directory contains namelists for WRF, WPS, and filter, along with a wrfinput file that matches what will be the analysis domain.

Finally, the output directory contains observations within each directory name. Template files will be placed here once created (done below), and as we get into the cycling the output will go in these directories.


Step 2: Initial conditions

To get an initial set of ensemble fields, depending on the size of your ensemble and data available to you, you might have options to initialize the ensemble from say a global ensemble set of states. Here, we develop a set of flow dependent errors by starting with random perturbations and conducting a short forecast. use the will use WRFDA random CV option 3 to provide an initial set of random errors, and since this is already available in the perturbation bank developed in the setup, we can simply add these to a deterministic GFS state. Further, lateral boundary uncertainty will come from adding a random perturbation to the forecast (target) lateral boundary state, such that after the integration the lateral boundaries have random errors.

First, we need to generate a set of GFS states and boundary conditions that will be used in the cycling. Use the script (in the script dir) named gen_retro_icbc.csh to create this set of files, which will be added to the output directory date directories. If you didn't already do so, edit this file to put in the appropriate path to your param.csh script. If the param.csh script also has the correct edits for paths, you have the executables placed in the rundir, etc., then running this script should execute a series of operations that extracts the grib data, runs metgrid, and then twice executes the serial build of real.exe to generate a pair of wrf files and a boundary file for each analysis time. Check in your output/2017042700 directory, and you should see these files:

The odd extensions of these files is the Gregorian dates for this files, which is used by the dart software for time schedules. Similar files (with different dates) should appear in all of the date directories.

Next, we will execute the script to generate an initial ensemble of states for the first analysis. For this we run the script init_ensemble_var.csh. You might want to modify this script to test running a single model advance first just in case you have some debugging to do. When complete for the full ensemble, you should find 50 new files in the directory output/2017042700/PRIORS with names like prior_d01.0001, prior_d01.0002, etc...


Step 3: Prepare observations

For the tutorial exercise, observation sequence files are provided to enable you quickly advance to running a test system. The observation processing is critical to the success of your results, so please plan to come back to this and invest some time in understanding the observation processing. In brief, there are many options provided by DART to convert a broad range of observations from a number of different sources. What makes for the most sense for you will depend on your application and what data sources you have ready access to. An example using bufr observations from the GDAS system is provided below.

DART has developed a number of tools for converting standard observation formats into observation sequence files, which is the format of observations used by the DART system. See the DART observations documentation page for a comprehensive overview of the available observation converters. For now, let's work through using NCEP bufr observations which contains a wide array of observation types from many platforms within a single file. Follow the guidance on the prepbufr page to build the bufr conversion programs, get observation files for the dates you plan to build an analysis for, and run the codes to generate an observation sequence file.

To control what observations are convertered, there are namelist controls worth investigating a bit here. Within your input.nml, add the following namelist for the bufr conversion:

   obs_window    = 1.0
   obs_window_cw = 1.5
   otype_use     = 120.0, 130.0, 131.0, 132.0, 133.0, 180.0
                   181.0, 182.0, 220.0, 221.0, 230.0, 231.0
                   232.0, 233.0, 242.0, 243.0, 245.0, 246.0
                   252.0, 253.0, 255.0, 280.0, 281.0, 282.0
   qctype_use    = 0,1,2,3,15

This defines an observation time window of +/- 1.0 hours, while cloud motion vectors will be used over a window of +/- 1.5 hours. Use observation types sounding temps (120), aircraft temps (130,131), dropsonde temps (132), mdcars aircraft temps, marine temp (180), land humidity (181), ship humidity (182), rawinsonde U,V (220), pibal U,V (221), Aircraft U,V (230,231,232), cloudsat winds (242,243,245), GOES water vapor (246), sat winds (252,253,255), ship obs (280, 281, 282). Include observations with specified qc types only. See the prepbufr page for more available namelist controls. Copy this input.nml into DART/observations/obs_converters/NCEP/prep_bufr/work/ directory.

Within DART/observations/obs_converters/NCEP/prep_bufr/work/prepbufr.csh:

set daily    = no
set zeroZ    = no # to create 06,12,18,24 convention files
set convert  = no
set block    = no
set year     = 2008
set month    = 5 # no leading zero
set beginday = 22
set endday   = 24
set BUFR_dir = ../data

Run the shell script to generate the intermediate format text files. Next, edit your input.nml to add the namelist below, and copy into the DART/observations/NCEP/ascii_to_obs/work/ directory, and run the quickbuild there.

   year       = 2008 
   month      = 5 
   day        = 22 
   tot_days   = 31 
   max_num    = 800000 
   select_obs = 0 
   ObsBase    = '../../path/to/temp_obs.' 
   ADPUPA     = .true. 
   AIRCFT     = .true. 
   SATWND     = .true. 
   obs_U      = .true. 
   obs_V      = .true. 
   obs_T      = .true. 
   obs_PS     = .false. 
   obs_QV     = .false. 
   daily_file = .false. 
   lon1       = 270.0 
   lon2       = 330.0 
   lat1       = 15.0 
   lat2       = 60.0

Look at the create_real_obs program help page to set/add the appropriate namelist options. Run create_real_obs and you'll get some observation sequence files, one for each six hour window. For a cycled experiment, the typical approach is to put a single set of observations, associated with a single analysis step, into a seperate directory. For example, within the output directory (we also will put inputs there to start), we would create directories like 2012061500, 2012061506, 2012061512, etc. for 6-hourly cycling. Place the observation files in the appropriate directory to match the contents in the files (e.g. obs_seq2012061500) and rename as simply obs_seq.out (e.g. work/output/2012061500/obs_seq.out).

It is helpful to also run the wrf_dart_obs_preprocess program, which strips away observations not in the model domain, can perform superobservations of typically dense observations, increases observation errors near the lateral boundaries, checks for surface observations far from the model terrain height, etc. These collectively improve the system performance and simplifies interpretting the observation space diagnostics. There are a number of namelist options to consider, and you must provide a wrfinput file for the program to access the analysis domain information.


Step 4: Cycled analysis system

For larger analysis domains, with many observations, run on modern supercomputers with batch queue job control systems, scripts can provide a practical solution to managing the completion of work tasks. A set of scripts is provided with the tutorial tarball. You will need to edit these scripts, perhaps extensively, to run them within your particular computing environment. If you will run on NCAR's Cheyenne environment, fewer edits may be needed, but you should familiarize yourself with running jobs on Cheyenne if necessary.

Within the scripts directory you'll find a parameter shell script (param.csh), a driver script (driver.csh), template scripts for advancing the ensemble members (assim_advance.csh) and running filter (assimilate.csh). Edit the parameter shell script as need to set up all of the appropriate paths and other settings (see comments in script: set this appropriately #%%%#). There are other options here to adjust cycling frequency, domains, ensemble size, etc., which are available when adapting this set of scripts for your own research. Edit the driver script to set the path to the parameter file (this is set the same as the first analysis time during testing, but you have things running, you can set this to future times to run through multiple cycles). I would advise commenting out all the places the script submits jobs while debugging, placing an 'exit' in the script at each job submission step. The driver script expects a date as a command line argument (YYYYMMDDHH), so you would for instance run it as:

The script will check that the input files are present (wrfinput files, wrfbdy, observation sequence, and DART restart files), create a job script to run filter in rundir, monitors that expected output from filter is created, then generates job scripts for all of the model advances. After this completes, a check for if this is the last analysis is done to determine if a new cycle is needed or not. A script is also launched by the driver to compute some observation space diagnostics and to convert the final observation sequence file into a netcdf format.


Step 5: Check your results

Once you have run the analysis system, it is time to check if things ran well or if there are problems that need to be addressed. DART provides analysis system diagnostics in both state and observation space. First - check to see if the analysis system actually changed the state. You should find a file in the output/$date/ directory called 'analysis_increment.nc' which is the change in the ensemble mean state from the background to the analysis after running filter. Use a tool, such as ncview, to look at this file. You should see spatial patterns that look something like the meteorology of the day. These are places where the background (short ensemble forecast) was adjusted based on the set of observations provided. You can also use the provided obs_diag program to investigate the observation space analysis statistics. You'll find the results of this in output/$date/obs_diag_output.nc. Additional statistics can be evaluated using the converted final observation sequence file in netcdf format from the obs_seq_to_netcdf tool. This file has a name like 'obs_epoch_029.nc', where the number in the file is largest in the most recent set of observations processed. The additional files enable plotting the time series of recently assimilated observations once multiple cycles have been run. Be sure to check that a high percentage (> 90%) of available observations were assimilated. Low assimilation rates typically point to a problem with the background analysis, observation quality, and/or observation error specification which are important to address before using system results for science.

If you encounter difficulties setting up, running, or evaluating the system performance, please contact us at dart(at)ucar(dot)edu.