Ralph, this is very much a work in progress and tailored specifically to the CoRA environment ...
The task is logically divided into four parts.
There is some overlap, however. The choice of geographic domain has a direct effect on the dimension of priors for the number of wavelet decomposition levels, the starting point for the equatorial normal modes, etc. ...
This image illustrates several points. The top figure (A) is typical of the result of the process; a zonal (u) wind field at half-degree resolution. Panel (B) is the divergence of the total wind field. Panel (C) is the spectra along each line of latitude (there are 96 lines of latitude) for the final product. The amplitude of each wavenumber is plotted as a single point, the mean of all 96 estimates is plotted as a line. Panel (D) is the same as Panel C except the input data is the best available NWP product (which is used in the data assimilation). The slanted red lines in C,D are identical and represent a slope of k^(-5/3), which is consistent with theory at these latitudes. Note how the NWP product does not have realistic variance at the high wavenumbers. full-size figure
The gross structure of the scripts comes from the way I had to do things on the IBM's. At one point, you were supposed to RUN in the General Parallel FileSystem (GPFS), but you WERE NOT supposed to COMPILE anything there. So, there is a crummy cd;copy-code,compile- // cd;run structure. In order to support "restart"ing, you had to make a "cookie" file and query it in at the shell level to see if the script needed to resubmit itself ... all very un-elegant.
I create all of the input namelist files as "here" documents in the script and the output filenames are all unique.
Should you run on the IBMs it is to your advantage to make sure the wall_clock_limit is appropriate. I burned through MANY GAU's when I had an incompatible library and tried to use MPI. Even my little test cases ran for the entire 6 hours ... on multiple processors.
Do "everything" in one go. Can only run for up to 6 hours on NCAR's IBM's. No restart capability.
Multiple-step philosophy. Restartable. GibbsInit.csh creates geometry and priors in first step, makes a "state" file. GibbsBlend[U,V].csh reads that state file and iterates until it is finished or time runs out. If time runs out, it creates another state file and resubmits itself. Repeat as needed.
Here's the tricky part.
The NWP data is the NCEP GDAS FNL data -- 1degree resolution. The files are the result of [/fs/cgd/home0/thoar/Winds/]GetNCEPwinds.csh which reads the NCEP grib files and uses "wgrib" to slice, dice and reformat the data into IEEE (big-endian) files with no header. Each file has one day's worth of data ... i.e. 00Z, 06Z, 12Z and 18Z The filenames are "gdas.fnlYYYY.MM.DD.UV.F00.ieee" where YYYY MM DD is the year, month and day, respectively. These files are read by SUB_GetNWP.F90 which has some CPP directives to try to locate these files on different filesystems.
The NWP data are archived at NCAR through an automatic widget which sometimes fails if the network is down, for example. Rigorously checking to make sure all timesteps are present is a worthwhile exercise.
For no particulary compelling reason, we are using the RSS Ku2000-series bmaps. The files are the result of [goldhill:/home/thoar/QuikSCAT/]GetQuikSCATwinds.csh This fundamentally reads the RSS datafiles with the RSS-supplied read routine and extracts a single wind from each WVC for just the tropics and writes out another (big-endian) IEEE file (in the format established long ago by Jan, Ralph, and myself). The filenames are "Ku2000equat_xxxxxx.ieee", where xxxxxx is the (zero-filled) orbit number. The IEEE files are then read by SUB_GetKU2000.F90. That routine also has CPP directives to facilitate finding the files.
We were using the RSS Ku2000-series bmaps. The files are the result of [goldhill:/home/thoar/QuikSCAT/]GetQuikSCATwinds.csh This fundamentally reads the RSS datafiles with the RSS-supplied read routine and extracts a single wind from each WVC for just the tropics and writes out another (big-endian) IEEE file (in the format established long ago by Jan, Ralph, and myself). The filenames are "Ku2000equat_xxxxxx.ieee", where xxxxxx is the (zero-filled) orbit number. The IEEE files are then read by SUB_GetKU2000.F90. That routine also has CPP directives to facilitate finding the files.
In order to determine what is ocean and what is not (for an arbitrary domain); an elevation dataset is needed. This should work unless you are trying to predict winds in Death Valley, Houston, New Orleans, or the Netherlands :) The successor to the ETOPO5 dataset is available on the MSS -- DSS759.2. The function LandMask reads the local version of the dataset (called TerrainBaseData.ascii) and checks the immediate vicinity of the prediction gridpoint for elevations above zero. If the percentage of the vicinity is above some threshhold, I call it "land". Your mileage will vary.
At some point soon, the directory will be made an input variable for both of these data types. This should simplify the entire matter.
Both SUB_GetNWP.F90 and SUB_GetKU2000.F90 have a couple of
lines that tell it the range of data available. I just have not made them sophisticated
enough to query the available files and build a set of dates available for each data
type, something necessary when you try to open a set of files ... So I define a set of
files to search. The achilles' heel of this is that you need to make sure the
subroutines are in-step with the data available.
are under CVS, in a project called Winds. There
are many(?) fortran files in this project. Most of the helper routines
live in modules, but some of the larger ones are standalone and have
interface blocks in the modules to facilitate argument-checking.
All the files that end in .F90 or .F contain platform-specific code
are run through cpp. Everything with lower case extensions is
platform-independent.
MY cheat-sheet of CVS commands
There is a semi-intelligent makefile that will create an executable
for each of several different platforms, making use of whatever library
is available on that platform. Great(?) pains were taken to use
vendor-specific libraries as these are much more highly optimized and usually
exist in thread-safe form. Supported platforms include CRAY*, IBM, SGI, and linux.
(*I have not used it on a CRAY in forever.) The old Sun compiler had a problem
with one of my routines, but the new compiler does not. However, I have a problem
trying to use the LINPACK routines in the Sun high-performance library
... *sigh* ... Since the SGI library has both LAPACK and LINPACK routines,
I know I am calling them correctly, it must be a F90 compile option issue.
All of the above routines share multiple subroutines and functions but are
still quite lengthy. To facilitate effecting changes in the codes, I have tried
to structure them such that "diff" is useful. Since "diff" can be quite picky,
I would prefer it if we did not make any more cosmetic changes to the code,
unless you want to make the same cosmetic change to all of them.
The library complib.sgimath contains an extensive collection of industry
standard libraries such as Basic Linear Algebra Subprograms (BLAS), the
Extended BLAS (Level 2 and Level 3), EISPACK, LINPACK, and LAPACK.
Internally developed libraries for calculating Fast Fourier Transforms
(FFT's) and Convolutions are also included, as well as select direct
sparse matrix solvers. Documentation is available per routine via
individual man pages. General man pages for the Blas ( man blas ), fft
routines ( man fft ), convolution routines ( man conv ) and LAPACK ( man
lapack ) are also available.
The Programs
setenv CVS_RSH ssh
cvs -d :ext:goldhill.cgd.ucar.edu:/home/thoar/CVS.REPOS co Winds
Blender.F90
Used by BlendU.csh and BlendV.csh.
The Granddaddy. Reads the input (namelists) and proceeds to go from soup to
nuts. No restart capability.
On blackforest and babyblue you can only run it for
6 hours, since this is the CPU limit for any one process.
On blackforest,babyblue there is a call to the system library
at the beginning of every Gibbs Iterate to make sure there is enough time
to eke out another iterate. If not, it wraps things up and terminates gracefully.
Wouldn't take too much to make it write out the necessary set of restart files...
BlendInit.F90
Used by GibbsInit.csh. Reads the input (namelists) and
creates a set of restart files.
Stops before doing any Gibbs Iterates. Runs in about 10 minutes usually.
Longest portion is determining the incidence matrices.
BlendWind.F90
Used by GibbsBlend[U,V].csh.
Has restart capability.
Reads the restart files and and "continues" the Gibbs Sampling.
On blackforest,babyblue there is a call to the system library
at the beginning of every Gibbs Iterate to make sure there is enough time
to eke out another iterate. If not, it wraps things up and terminates gracefully.
Blackforest, Babyblue ... IBM
The most mature ... ESSL. General Parallel Filesystem(GPFS) has a scrubber. RCP does
not work "as you would expect" -- must fire off YET ANOTHER script to do RCP,
creates a problem for unique filnames ... Optimization higher than -O2 causes bad
things to happen. Totalview ...
Dataproc ... SGI
The Silicon Graphics Scientific Mathematical Library, complib.sgimath,
is a comprehensive collection of high-performance math libraries providing
technical support for mathematical and numerical techniques used in
scientific and technical computing. This library is provided by SGI for
the convenience of the users. Support is limited to bug fixes at SGI's
discretion.
Longs ... the linux box
has the Lahey F95 compiler (lf95) with LAPACK. Also has the Portland Group F90
(pgf90) compiler (with ?). Is running
Linux longs 2.2.17-14 #2 SMP Tue May 22 10:53:51 MDT 2001 i686 unknown
Do not know what is entailed in making a relocatable executable. There is
a run-time switch(?) to read/write big-endian files. How this affects relocatable
code ...
Nightingale ... Solaris 2.6
I have a problem trying to use the LINPACK routines in the Sun high-performance
library. Since the SGI library has both LAPACK and LINPACK routines,
I know I am calling them correctly, it must be a F90 compile option issue. (-q64?)
TaskXX.namelist
&geometry
grdlats = -23.75, 0.5, 23.75
grdlons = 52.25, 0.5, 179.75
pt1 = 2001, 1, 5, 0
ptN = 2001, 1, 8, 18
pdt = 6
dstruleflag = 4
navg = 35
radius = 165.0
minamb = 0
ku2000mask = 01100
/
keyword | explanation |
---|---|
grdlats | latitudes of interest |
grdlons | longitudes of interest |
pt1 | initial prediction time |
ptN | final prediction time |
pdt | prediction timestep |
dstruleflag | |
navg | |
radius | |
minamb | |
ku2000mask |
&input cg_tol = 0.0005 cg_max = 300 cg_strt = 3 cut_min = 8 nmax = 2 kmax = 4 he = 25.0 kslope = 1.666667 wind = "u" Nburn = 5 Niter = 10 BatchSize = 5 Niters2save = 5 IterSpacing = "linear" Ngridlocs = 150 iseed = 10 / ! cg_tol = 0.0005 ! conj gradient tolerance ! cg_max = 100 ! .. max # of iterations ! cg_strt = 3 ! .. start value ! cut_min = 8 ! smallest multiresolution size ! nmax = 2 ! # of N-S modes ! kmax = 4 ! # of zonal modes ! he = 25.0 ! kslope = 5.0/3.0 ... the slash is a killer ! wind = "u" !---------------------------------------------------------------------------- ! Restart Parameters !---------------------------------------------------------------------------- ! integer Nburn = 500 ! # to gibbs iterations to compute and throw away ! integer Niter = 100 ! # of gibbs iterations to compute past burn-in ! integer BatchSize = 50 ! # in batch ! integer Ngridlocs = 100 ! # of locs that get saved EVERY gibbs iter ! integer Niters2save = 50 ! some fields gets saved every so often -- ! ! the spacing is determined by IterSpacing... ! IterSpacing = "exponential" ! exponentially favor later iterates to save ! IterSpacing = "offset" ! uniform, but only past burn-in ! IterSpacing = "other" ! uniform over ALL iterates, pre-and post-burn
&priors mu_muL_v = -0.36, 0.02 Sig_muL_v = 0.00001, 0.0, 0.0, 0.00005 mu_muL_u = -2.8, 0.4 Sig_muL_u = 0.00001, 0.0, 0.0, 0.00005 mua = 0.0 s2a0 = 100.0 mub = 0.0 me_mean_aB = 2.0 me_var_aB = 0.15 me_mean_aI = 1.5 me_var_aI = 0.15 me_mean_s = 2.0 me_var_s = 0.1 mu_hb = 0.4 s2_hb = 0.01 SHpr = 100.0 svarvec = 2.0, 0.01, 0.01, 0.01, 1.0 bvar = 3.0 w_list_u = -0.217, -0.180, -0.153, -0.130, -0.033, -0.053, -0.066, -0.075, -0.579, -0.590, -0.622, -0.663 w_kel = 0.095, 0.192, 0.281, 0.381 w_list_v = -0.217, -0.180, -0.153, -0.130, -0.033, -0.053, -0.066, -0.075, -0.579, -0.590, -0.622, -0.663, -0.750, -0.750, -0.750, -0.750 Seta_avgPrior = 70 avar_list = 2295, 2400, 2645, 2222, 14162, 8297, 4877, 2553, 347, 343, 344, 300, 153, 153, 153, 153 / # &priors # mu_muL_v = -0.36, 0.02 # Sig_muL_v = 0.00001, 0.0, 0.0, 0.00005 # mu_muL_u = -2.8, 0.4 # Sig_muL_u = 0.00001, 0.0, 0.0, 0.00005 # mua = 0.0 # s2a0 = 100.0 # mub = 0.0 # me_mean_aB = 3.4 # me_var_aB = 0.3 # me_mean_aI = 1.7 # me_var_aI = 0.3 # me_mean_s = 2.0 # me_var_s = 0.1 # mu_hb = 0.4 # s2_hb = 0.01 # SHpr = 100.0 # svarvec = 2.0, 0.01, 0.01, 0.01, 1.0 # bvar = 3.0 # w_list_u = -0.217,-0.180,-0.153,-0.130,-0.033,-0.053,-0.066,-0.075,-0.579,-0.590,-0.622,-0.663 # w_kel = 0.095, 0.192, 0.281, 0.381 # w_list_v =-.217,-.18,-.153,-.13,-.033,-.053,-.066,-.075,-.579,-.59,-.622,-.663,-.75,-.75,-.75,-.75 # Seta_avgPrior = 70 # avar_list = 2295,2400,2645,2222,14162,8297,4877,2553,347,343,344,300,153,153,153,153 # /
note: The times have been converted to a "convenient" monotonic base compatible with the intrinsic functions for the respective packages. In IDL, you can convert to a calendar date (string) with   date_string(bob.t(timestep)). In Matlab, the same thing can be achieved with   datestr(bob.t(timestep),0)
fname = '/data/ocean13/pub/thoar/U0/Gibbs_u_20010101_alliters.ieee'; niters = 150; alliters = GetAllIters(fname, niters);
The blend routine writes out some information for a select set of gridpoints at every Gibbs iterate, including the burn-in period.   niters   is needed only if you do not want to return ALL the iterations. GetAllIters returns a structure with named components:
IDL/Matlab     | Fortran       | shape | Interpretation |
---|---|---|---|
niters | iterN | scalar | total # of gibbs iterates, including burn-in |
nA | nA | scalar | # of large-scale modes |
nT | nT | scalar | # of prediction epochs (timesteps) |
nP | nP | scalar | # of spatial means |
Pdomain | Pdomain | [3,3] | Prediction Domain extents |
Ngridlocs | Ngridlocs | scalar | # of selected gridpoints |
gridlocs | sgridlocs | [Ngridlocs] | state-space ID of selected gridpoints |
lats | grdxy(*,1) | [Ngridlocs] | latitudes of selected gridpoints |
lons | grdxy(*,2) | [Ngridlocs] | longitudes of selected gridpoints |
iters | iters | [niters] | the Gibbs Iterations in the file |
mu | mu | [Ngridlocs,niters] | spatial mean |
u1 | u1 | [Ngridlocs,nT,niters]   | wind = mu + u1 + u2 |
u2 | u2 | [Ngridlocs,nT,niters] | wind = mu + u1 + u2 |
CG_stat | CG_stat | [nT,niters] | # of conjugate gradient trips |
FileName |   |   | drag around filename for posterity |
Other possibilities | |||
a | a | [nA,nT+1, niters] | large-scale mode coefficients |
Hv_a | Hv_a | [nA, nA, niters] | propagator matrices for large-scale modes |
s2eta_a | s2eta_a | [nA, nA, niters] |   |
mu_L | mu_L | [nP, niters] |   |
s2epaB | s2epaB | [niters] |   |
s2epaI | s2epaI | [niters] |   |
s2eps | s2eps | [niters] |   |
Hv_b | Hv_b | [Ngridlocs,niters] | propagator matrices for wavelet coefficients |
s2eta_b | s2eta_b | [Ngridlocs,niters] |   |
fname = '/data/ocean13/pub/thoar/U0/Gibbs_u_20010101_batch_002.ieee'; datstr = GetBatch(fname);
The blend routine creates batch means and variances and writes them to a file every   BatchSize   Gibbs iterates, including the burn-in period. Consequently, some batches may have results from both the pre-and post- burn-in period. GetBatch returns a structure with named components:
IDL/Matlab     | Fortran       | shape | Interpretation |
---|---|---|---|
nlats | GNY | scalar | # of prediction grid latitudes |
nlons | GNX | scalar | # of prediction grid longitudes |
NG | NG | scalar | # of locations in prediction grid (=nlats*nlons) |
nA | nA | scalar | # of large-scale modes |
nT | nT | scalar | # of prediction epochs (timesteps) |
nP | nP | scalar | # of spatial means |
iter | iter | scalar | the Gibbs Iteration when the batch was written |
BatchSize | BatchSize | scalar | # of Gibbs Iterates in each batch |
lats | pred_y | nlats | the latitudes of the prediction grid |
lons | pred_x | nlons | the longitudes of the prediction grid |
t | pred_t | nT | the times of the prediction epochs see Time NOTE |
Pdomain | Pdomain | [3,3] | Prediction Domain extents |
uMean | uMean | [NG,nT]   | mean windfield over last   BatchSize   iterates |
ustd | ustd | [NG,nT]   | standard deviation of uMean |
aMean | aMean | [nA,nT+1] | mean large-scale mode amplitudes |
astd | astd | [nA,nT+1] | standard deviation of aMean |
wvMean | wvMean | [NG,nT+1] | mean wavelet component |
wvstd | wvstd | [NG,nT+1]       | standard deviation of wvMean |
FileName |   |   | drag around filename for posterity |
Time NOTE: The times have been converted to a "convenient" monotonic base compatible with the intrinsic functions for the respective packages. In IDL, you can convert to a calendar date with   date_string(bob.t(timestep)). In Matlab, the same thing can be achieved with   datestr(bob.t(timestep),0)
fname = '/data/ocean13/pub/thoar/U0/Gibbs_u_20010101_poster.ieee'; niters = 150; datstr = GetPoster(fname,niters);
The blend routine writes out the wind field for EVERY iterate
after the burn-in in the "poster" file.
  niters   specifies (
The blend routine writes out entire spatial fields for many
variables at select Gibbs iterates. At most,   niters  
iterates will be returned.
GetSpatial returns a structure with named components:
This function actually reads the entire state but only returns
a very few variables and is really only meant to demonstrate what
could be done.
GetState returns a structure with named components:
IDL/Matlab     Fortran      
shape Interpretation nlats GNY scalar
# of prediction grid latitudes nlons GNX scalar
# of prediction grid longitudes nT nT scalar
# of prediction epochs (timesteps) lats pred_y nlats
the latitudes of the prediction grid lons pred_x nlons
the longitudes of the prediction grid t pred_t nT
the times of the prediction epochs iters iters [niters]
the Gibbs Iterate ID's wind u [nlats*nlons,nT,niters]  
the wind field in state-space form FileName    
drag around filename for posterity
GetSpatial
fname = '/data/ocean13/pub/thoar/U0/Gibbs_u_20010101_spatial.ieee';
niters = 20;
datstr = GetSpatial(fname,niters);
IDL/Matlab     Fortran      
shape Interpretation nlats GNY scalar
# of prediction grid latitudes nlons GNX scalar
# of prediction grid longitudes nT nT scalar
# of prediction epochs (timesteps) niters Niters2save scalar
# of iterations saved lats pred_y [nlats]
the latitudes of the prediction grid lons pred_x [nlons]
the longitudes of the prediction grid t pred_t nT
the times of the prediction epochs
see Time NOTE iters iters [niters]
the Gibbs iterate of these results s2eta_b s2eta_b [nlats*nlons,niters]  
  wind u [nlats*nlons,nT,niters]
wind component in state-space form FileName    
drag around filename for posterity Other possibilities mu mu [NG,niters]
spatial mean u1 u1 [NG,nT,niters]
either the large-scale wind component or u2 u2 [NG,nT,niters]
the small-scale wind component, I forget. a a [nA,nT+1,niters]
large-scale mode coefficients b b [NG,nT+1,niters]
wavelet representation of small-scale features Hv_a Hv_a [nA, nA, niters]
propagator matrices for large-scale modes Hv_b Hv_b [Ngridlocs,niters]
propagator matrices for wavelet coefficients
GetState
fbase = '/data/ocean13/pub/thoar/U0/Gibbs_u_20010101_state_00001';
datstr = GetState(fbase)
IDL/Matlab     Fortran      
shape Interpretation iter iter scalar
the Gibbs Iterate nlats GNY scalar
# of prediction grid latitudes nlons GNX scalar
# of prediction grid longitudes NG NG scalar
# of locations in prediction grid (=nlats*nlons) nT nT scalar
# of prediction epochs (timesteps) nA nA scalar
# of large-scale modes lats pred_y [nlats]
the latitudes of the prediction grid lons pred_x [nlons]
the longitudes of the prediction grid t pred_t nT
the times of the prediction epochs
see Time NOTE a a [nA,nT+1]  
large-scale mode amplitudes b b [NG,nT+1]  
wavelet representation of small-scale features wind u [NG,nT,niters]
wind component in state-space form FileName    
drag around filename for posterity
netCDF generation
List of things "to do" ...
convert to return proper spatial fields instead of state-space form?
Scientific Objectives ...
PROPOSED SCHEDULE: First Science Application of Tropical u,v BHM
0. Change time period/MJO event [TJH, RFM]
want more continuous convection/propagation
1. Burn-in Validation [TJH, DN]
(150 iterations = 6 hrs on IBM)
a) examine samples of posterior
(4 per day * 52 days * 50 iterates)
b) tune conjugate gradient convergence parameter
c) find AR(1) parameters for posterior distribution
(at representative points, as a function of it #)
2. Surface wind Convergence Maps [TJH, RFM]
a) does it look right?
(simple animations)
b) compare with TAO-buoy (hourly) wind records
(http://www.pmel.noaa.gov/tao/index.shtml)
c) goto 1a) above until satisfied
3. ASIDE: drive 1D ocean model with realizations from posterior dist
[DN, RFM, TJH]
a) Large et al upper ocean model, equatorial Pacific initialization
b) Doug wants this by Nov 2001; diplomacy issue with Chris
4. Cloud Data comparison to infer MJO convection [RFM, TJH]
a) identify/obtain best dataset (OLR?, TRMM?)
5. SST comparison for MJO propagation [RFM, TJH]
a) SST from TMI on TRMM (see Chelton et al; contact Dudley)
6. Co- Cross-Spectral analyses sampling from posterior distribution
(revisit comparison with TAO for SST as well)
[TJH, RFM, RAM]
a) surface wind convergence vs. convection
b) surfacw wind convergence vs. SST
c) convection vs. SST
7. Technical Report preparation [all]
8. Journal paper preparation [all]
Document: /staff/thoar/WindInstructions.shtml
Last modified: Thursday, 15-Feb-2018 11:08:35 MST
Tim Hoar - thoar@ucar.edu