DART project logo

Jump to DART Lanai Documentation Main Index


WRF/DART Tutorial materials presented at NCAR on January 22, 2014


Getting started running your own WRF/DART system

If, after working through all of the tutorial materials above, you are now ready to try building your own WRF/DART analysis system, the following basic guide is designed to help get you going with a real data retrospective cycled analysis. Note, this tutorial was assembled to be compatible with WRF V3.6 and DART Lanai release. Contact us if you need additional assistance.

DISCLAIMER: We do not claim that this is a turnkey or blackbox system. Be mentally prepared to invest a reasonable amount of time on the learning curve. There are many outstanding research issues which have no easy answers here. This is not a one week/grad student/naive user system. Even after you get the code up and running, you have to be able to interpret the results, which requires developing specific skills. There are a lot of ways to alter how the system works -- localization, inflation, which variables and observations are assimilated, the assimilation window time, the model resolution, etc, etc. This is both good and bad - you have many ways of improving your results, but you have to take care on how you leave all the settings of these inputs.

If you are a visual learner, this diagram may help you understand the overall flow of a cycled data assimilation system.

Step 1: Setup

Compile all of the components needed. This will include the wrf model branch of DART ($DART/models/wrf/work - run quickbuild.csh after setting up your mkmf.template), WRFDA (needed to generate perturbed initial ensemble), and WRF components (WPS, real_em build of WRF). It is assumed here that you are already comfortable running WRF. If not, work through the WRF model tutorial first before trying WRF/DART.

Create a work directory, and within it make the following sub directories: rundir, scripts, obsproc, output, template. The directory names (case sensitive) are important, as the scripts described below assume these names for generating paths to various things. Copy the DART executables into the work executable directory (e.g. /scratch/yourname/work/rundir), and the WRF executables and support files (except for namelist.input) into work/rundir/WRF_RUN/. Look at the building blocks presentation above, slide 20/25 for what the run directory contents might look like, but the executables you are most likely to need include advance_time, filter, dart_to_wrf, wrf_to_dart, pert_wrf_bc, fill_inflation_restart, obs_diag, obs_sequence_tool, replace_wrf_fields, restart_file_tool, update_wrf_bc, and wrf_dart_obs_preprocess. Note that 'advance_temp*' (one for each ensemble member) and a WRF directory will be created in rundir later. Copy the advance_model.csh script from the $DART/models/wrf/shell_scripts directory to your work/rundir/. The basic input.nml DART namelist file also needs to be copied over from the $DART/models/wrf/work/ directory to your work/rundir/ and work/template/ (named input.nml.template) directores. Necessary edits will be discussed later as needed. In the rundir, create a file called bc_pert_scale and populate it with the following values (one value on each line) 0.15 1.0 0.75 0.0 . These values control the scale and amplitude of perturbations (see the advance_model.csh script to see how these values are used to modify the namelist for WRFDA) used in the lateral boundary conditions, users familiar with WRFDA may want to edit these values. From your WRFDA build, also copy the da_wrfvar.exe executable and the be.dat to the work/rundir/WRF_RUN/ directory. The latter can be copied from a file called be.dat.cv3 in the WRFDA/var/run directory. You should also modify the default input.nml file you copy over for the namelist &restart_file_tool for parameter output_is_model_advance_file&nbps;=&nbps;.true., (default is .false.).


Step 2: Initial conditions

Using WPS, define the WRF domain and initial analysis state (as met_em files) for the retrospective cycling window. For initial testing of scripts, recommend you build a small WRF state that can be quickly integrated during testing. You'll need to generate all of the needed met_em files for the initialization and period of cycling. Copy the namelist.input file to the work/template directory but named as namelist.input.template, as well as a wrfinput file generated above. If you need suggestions on where to find an initial analysis state for a retrospective test, consider using GFS analysis from DSS. If you meet the requirements, you can request a data access account. On yellowstone you could, for instance, use GFS final analysis files for initial and future boundary conditions (ideal boundary condition assumption), with 6-hourly grib2 files that you could copy in path patterns like:
Let's assume you will use 6-hourly cycling for your initial test. Then, real.exe needs to be run to define a WRF initial state 6 hours prior to your first planned assimilation time, and boundary files for integration from this -6 h time to the first analysis time continuing through the cycling window. This script might be helpful in generating an initial set of wrfinput and wrfbdy files from the set of met_em files generated above (combine with parameter file $DART/models/wrf/shell_scripts/init_param.csh), or you could make your own script(s). The provided script builds files for an option we call 'on-the-fly' perturbations, which only requires you to first generate a wrfinput file for the initial state, the target state (time of next analysis) and a lateral boundary condition file valid during this window. One set of wrfinputs and wrfbdy files must be made for each cycle.i When you look through the details of the script, you will see that it starts from a given date and runs real.exe for both the start and end of each assimilation window. If your met_em files are meant to replicate a realtime system, you would need the linked met_em files to be replaced every cycle since they are named by the valid date (which would overlap for each cycle forecast/analysis). Unique perturbations for each ensemble member model advance are created later as needed during the cycling. Note that tu run WRFDA, you must provide an appropriate namelist. Here, this namelist is expected to be appended to your WRF namelist. You can start with adding this sample to your WRF namelist template file (not provided, since this is dependent on your WRF configuration and version).

To prepare for the first analysis, we would like a set of perturbed initial states with flow-dependent differences. Thus, we will start with random perturbations for an ensemble initial state one cycle before the first analysis, and make a short ensemble forecast so initial differences develop flow dependence. Copy the $DART/models/wrf/shell_scripts/init_ensemble_var.csh script (and support scripts) to your work/scripts directory to prepare the initial ensemble state (see notes inside shell script). The end result, if everything ran correctly, would be a set of initial ensemble states in DART format restart files in the work/output directory. I strongly encourage you to test this as a single member ensemble while debugging. Note that to get started, you will need a template input.nml namelist for the DART executables. One change to make from the default entries is in the restart_file_tool_nml for entry input_file_name, which should be changed from "filter_restart" to "restart_file_input". Similarly, change the output file name to "restart_file_output".


Step 3: Prepare observations

DART has developed a number of tools for converting standard observation formats into observation sequence files, which is the format of observations used by the DART system. See the DART Lanai observations page for a comprehensive overview of the available observation converters. For now, let's work through using NCEP bufr observations which contains a wide array of observation types from many platforms within a single file. Follow the guidance on the prepbufr page to build the bufr conversion programs, get observation files for the dates you plan to build an analysis for, and run the codes to generate an observation sequence file.

To control what observations are convertered, there are namelist controls worth investigating a bit here. Within your input.nml, add the following namelist for the bufr conversion:

   obs_window    = 1.0
   obs_window_cw = 1.5
   otype_use     = 120.0, 130.0, 131.0, 132.0, 133.0, 180.0
                   181.0, 182.0, 220.0, 221.0, 230.0, 231.0
                   232.0, 233.0, 242.0, 243.0, 245.0, 246.0
                   252.0, 253.0, 255.0, 280.0, 281.0, 282.0
   qctype_use    = 0,1,2,3,15

This defines an observation time window of +/- 1.0 hours, while cloud motion vectors will be used over a window of +/- 1.5 hours. Use observation types sounding temps (120), aircraft temps (130,131), dropsonde temps (132), mdcars aircraft temps, marine temp (180), land humidity (181), ship humidity (182), rawinsonde U,V (220), pibal U,V (221), Aircraft U,V (230,231,232), cloudsat winds (242,243,245), GOES water vapor (246), sat winds (252,253,255), ship obs (280, 281, 282). Include observations with specified qc types only. See the prepbufr page for more available namelist controls. Copy this input.nml into DART/observations/NCEP/prep_bufr/work/ directory.

Within DART/observations/NCEP/prep_bufr/work/prepbufr.csh:

set daily    = no
set zeroZ    = no # to create 06,12,18,24 convention files
set convert  = no
set block    = no
set year     = 2008
set month    = 5 # no leading zero
set beginday = 22
set endday   = 24
set BUFR_dir = ../data

Run the shell script to generate the intermediate format text files. Next, edit your input.nml to add the namelist below, and copy into the DART/observations/NCEP/ascii_to_obs/work/ directory, and run the quickbuild there.

   year       = 2008 
   month      = 5 
   day        = 22 
   tot_days   = 31 
   max_num    = 800000 
   select_obs = 0 
   ObsBase    = '../../path/to/temp_obs.' 
   ADPUPA     = .true. 
   AIRCFT     = .true. 
   SATWND     = .true. 
   obs_U      = .true. 
   obs_V      = .true. 
   obs_T      = .true. 
   obs_PS     = .false. 
   obs_QV     = .false. 
   daily_file = .false. 
   lon1       = 270.0 
   lon2       = 330.0 
   lat1       = 15.0 
   lat2       = 60.0

Look at the create_real_obs program help page to set/add the appropriate namelist options. Run create_real_obs and you'll get some observation sequence files, one for each six hour window. For a cycled experiment, the typical approach is to put a single set of observations, associated with a single analysis step, into a seperate directory. For example, within the output directory (we also will put inputs there to start), we would create directories like 2012061500, 2012061506, 2012061512, etc. for 6-hourly cycling. Place the observation files in the appropriate directory to match the contents in the files (e.g. obs_seq2012061500) and rename as simply obs_seq.out (e.g. work/output/2012061500/obs_seq.out).

It is helpful to also run the wrf_dart_obs_preprocess program, which strips away observations not in the model domain, can perform superobservations of typically dense observations, increases observation errors near the lateral boundaries, checks for surface observations far from the model terrain height, etc. These collectively improve the system performance and simplifies interpretting the observation space diagnostics. There are a number of namelist options to consider, and you must provide a wrfinput file for the program to access the analysis domain information.


Step 4: Cycled analysis system

For larger analysis domains, with many observations, run on modern supercomputers with batch queue job control systems, scripts can provide a practical solution to managing the completion of work tasks. A skeleton script set is provided here to help you get started. You will need to edit these scripts, perhaps extensively, to run them within your particular computing environment. If you will run on NCAR's Yellowstone environment, fewer edits may be needed, but you should familiarize yourself with running jobs on Yellowstone if necessary.

Start by unzipping and untarring the files in your scripts directory. You'll find a parameter shell script like those used above (ultimately you'll want to consolidate all of these), a driver script (control.csh), template scripts for advancing the ensemble members and running filter (the control script uses these templates to generate the actual job scripts). Edit the parameter shell script as need to set up all of the appropriate paths, cycling frequency, domains, ensemble size, job system definitions, etc. Edit the control script to set the path to the parameter file, and the end of the cycling period (make this the same as the first analysis time during testing). I would advise commenting out all the places the script submits jobs while debugging. The other scripts should not need to change. The control script expects a date as a command line argument (YYYYMMDDHH), so you would for instance run it as:
csh control.csh 2013052200 >& run.out &
The script will check that the input files are present (wrfinput files, wrfbdy, observation sequence, and DART restart files), create a job script to run filter, checks that expected output from filter is created, then generates job scripts for all of the model advances. After this completes, a check for if this is the last analysis is done to determine if a new cycle is needed or not.


Step 5: Check your results

Once you have run the analysis system, it is time to check if things ran well or if there are problems that need to be addressed. DART provides analysis system diagnostics in both state and observation space. First - check to see if the analysis system actually changed the state. Use the command ncdiff Posterior_Diag.nc Prior_Diag.nc increment.nc. Use a tool, such as ncview, to look at the resulting file. You should see spatial patterns that look something like the meteorology of the day. These are places where the background (short ensemble forecast) was adjusted based on the set of observations provided. You can also use the provided obs_diag program to investigate the observation space analysis statistics. Additional statistics can be evaluated after converting the final observation sequence file into a netcdf format file using the obs_seq_to_netcdf tool. Be sure to check that a high percentage (> 90%) of available observations were assimilated. Low assimilation rates typically point to a problem with the background analysis, observation quality, and/or observation error specfication which are improtant to address before moving forward.