NCAR's Geophysical Statistics Project

Institute for Mathematics Applied to Geosciences Institute for Mathematics Applied to Geosciences

Table of Contents

  1. Using DI
  2. Basic Data Set

Using the DI Package

Introduction

The basic function of DI is to quantify the accuracy of spatial predictions to points where data is not collected, and we use ozone air quality measurements for the tutorial. Although this software addresses a general problem in spatial statistics it use is targeted at the evaluation of monitoring networks for air quality. For this reason through this manual we will refer to the spatial locations where data is collected as a "network".

What is DI?

DI is a library of functions (written in the S language) and datasets added to the S-PLUS statistical package. Some of these functions create the GUI boxes used to setup plots while others compute the necessary statistics. The functioning of DI is greatly simplifed because it is added on top of a spatial statistics library, Fields, which is also designed for S-PLUS. Although Fields does not provide a graphical interface, many of the statistical calculations done in DI use the basic Fields functions. Part of DI's usefulness is that an experienced S-PLUS user can do some spatial analysis using command-line functions and then present the final results using the DI GUI functions. At the S-PLUS command line one has all the functionality of the Fields package. This manual gives some examples of analyzing monitoring networks using DI alone but also gives some examples where the network information and spatial models are set up using Fields and S-PLUS functions in the commmand window and then DI is used to display the results.

How DI works

DI has a simple structure centered on two kinds of activities. It provides an easy way to 1) setup a spatial prediction problem and then to 2) analyze the prediction accuracy. The spatial problem requires knowledge of the spatial covariance function and the locations where measurements are made. Once these are specified, the predictions take the form of the usual best linear unbaised estimates (BLUE) or Kriging estimates.

Throughout this introduction we will refer to some S-PLUS datasets as objects. Objects are just lists in S-PLUS that have specific components and are given a name (class). Their benefit is that they bundle different kinds of data together into one unit. The two main objects used in DI are a covariance object and a network object.

Briefly, the covariance object is used to define a spatial covariance function for a given spatial prediction problem. At the minimum, the network object has the network locations and an associated covariance object. The functions in DI are used to create or modify a network object or are used to examine the accuracy when spatial predictions are made from observations on this network. The key idea is that all the information for Kriging for a network is contained in the associated network object.

Getting started

Before going through the DI buttons, here are useful commands for starting and exiting DI.

S-PLUS command       action
library(di) # loads the DI library
di.close() # exits DI
di.init() # restarts DI (requires that the DI library is still attached)

Note: When opening the DI library, a graph sheet will appear. This graph sheet will enable you to use other color schemes. If you close this graph sheet, you can obtain a new one with the graphsheet command or use the DI function refresh.graphsheet to obtain a new graphsheet with the default color scheme.

A Quick Example

DI enables us to plot and edit network configurations and allows us to overlay a U.S. map (with county boundaries if desired) along with an image and/or contour plot of the standard errors of prediction for the network. To plot such a network, simply click on the "Network Plot" button to bring up the "Network Plots" dialog box as seen below.

Here we use NC.ex.net, which is an example network object that is included with the DI package. For this example, we have selected the image plot for standard errors of prediction as well as the county boundaries. The result is shown below.

The above standard errors of prediction are obtained from kriging the network locations. Kriging is a technique for evaluating data that is spatially dependent. It is a regression fit where, in this case, we use the network locations to predict the value at the locations, but instead of assuming independence between observations (locations) we assume that they are spatially dependent. It is, therefore, necessary to estimate a covariance function for the spatial dependence from the data, and then fit a generalized linear regression model using this covariance function. We use the krig model to predict observations on a grid and then obtain the standard errors of these predictions. It is then these standard errors that are plotted (using the image.plot function included with DI) above.

To see which covariance function is being used with our NC.ex.net example, simply type NC.ex.net$cov.obj at the command prompt in the Commands window. It will display the name of the covariance function and the parameters used. To learn more about the DI covariance object click here. To see an example of how to choose a covariance function Click here.

The Toolbar Buttons

Create/Modify Covariance Object Create/Modify Network Object Plot Network Object Plot Probabilities of Exceedence Edit Network

Currently, there are 5 buttons in the toolbar. We summarize their functions below but the reader is referred to the following examples that give snapshots of the different popup windows produced by each button.

[1] Create/modify a covariance object
Gathers the information needed to describe the spatial covariance.

(Required) Name of new object (default is new.cov.obj).
(Optional) The name of the covariance function. Currently, DI has CM.cov, and the exponential covariance function. Others can be created in S by some simple programming.

Other options will depend on your choice of covariance function. Some such options are listed.
(Optional) The range parameter is the scaling parameter used in the covariance.
(Optional) Lon/Lat check box. If checked then the distance between locations will be computed assuming the lcoations are in longitude and latitude. The distances scale is in miles. So for example with this box checked the range parameter should be in units of miles.

[2] Create/modify a network object
This object has the basic components:
(Required) Name of new object (default is new.network)
(Required) The network locations ( design) a 2 column matrix
(Optional) The grid points used to evaluate spatial predictions
(Required) Name of the covariance object created by function [1]

[3] Network plot
This will produce a plot of the network along with an optional image and/or contour plot showing the standard error of prediction based on the network observations.

(Required) Network object name. Note pull down menu to choose among objects that have been created.
(Optional) Several check boxes that reflect options for the plot.

[4] Plot Probabilities
Creates an image and/or contour plot of probabilities of exceeding a particular value. This function fits a surface to the observed data and computes the probability of an exceedence based on standard formulas for the kriging error.

(Required) Network Object (network object chosen must include a measurements matrix).
(Required) Date (or row name of measurements matrix row to be kriged).

(Required) Exceedence (default is 85 ppb (ozone))
(Optional) Plot probabilities, means, standard deviations, krig diagnostics, or all of these.
(Optional) Several options for krig and whether to plot maps.

[5] Edit Network
        This function changes the station locations based on adding or deleting stations using positions of the mouse. In order for this function to work one must first plot the network using the network plotting function. Even though this function just changes the design all the other netowrk information ( covariance function, grid etc.) are copied to the new object.

(Required) New network name. After editing the name of the modified network.
(Required) Old network name
(Optional) Tolerance for deciding when a mouse click is conincident with a design point or a distinct new location.

Details:
Move the mouse cursor to the plotted network. Clicking on the left button where there is not a design point will add a new location. Clicking on the left button at any current location will delete it.

To stop the editing click on the right mouse button.



The Covariance Object

To create the covariance object, click on the first toolbar button in the DI toolbars or click on create covariance object under DI in the drop down menu. We will call our covariance object NC.cov.obj, which we select by typing this in under New Covariance Object Name. We will use the correlation model by choosing CM.cov under Type. Once we have chosen the type of covariance function we want to use we must click on Create Object to select the rest of the parameters.

After clicking on the Create Object button, a new dialog window should appear. This dialog window is specific to the type of covariance function selected--in our case, CM.cov. Regardless of the type of covariance function selected, we have the option of selecting mean and/or standard deviation objects. We will use NC.mean for Mean Object, NC.sd for Standard Deviation as created below. Since our data locations are given in longitude and latitude coordinates (using negative longitude) we must check Longitude and Latitude if it is not already checked. Finally, since we chose CM.cov we must give it some kind of regression fit. We will use the fit NC.cor.fit, which we found below. For fit we can use any S-PLUS function that gives the correlation between two locations as a function of their distance of separation.
Once all the necessary parameters have been selected, click OK. You will then be back to the first dialog box. If you wish to create another covariance object, then click on apply to save the object you just created, then type a new name into the New Covariance Object Name box and create your new covariance object as above. When you are finished creating covariance objects click OK on the "Create a new covariance object" dialog.



The Network Object

To create the network object click on the second toolbar button of the DI toolbar buttons or Make Network Object from the DI drop-down menu.

Return Value
We will call our new network object NC.net by typing this in the Network Object box.

Design
We will type NC.ex$loc in the locations list box and NC.ozmax8 in the Measurements box--these will not be part of the drop-down menus due to NC.ex being a list object instead of an n X 2 matrix and NC.ozmax8 having the ambiguous original property of being a data frame. Note also that we cannot type NC.ex$ozmax8 into the Measurements box even though it is there; it cannot be part of a list, only its own entity. This is why we created NC.ozmax8 below. It is also a good idea to make sure the row names of the Measurements matrix are as you want them to be; in this case as dates; otherwise it will use row numbers as a default for the row names. The Measurements should be a matrix whose rows and row names reflect, for example, a given day. So, each row will have the measurements for the network on a given day.

Covariance Object
We will select the covariance object that we just made above, NC.cov.obj--this should be listed as an option in the Covariance Object drop down menu. Note that we must create the covariance object before we create the network object.

Grid
The grid defines the range and resolution of the predicted surfaces. The Grid Points for Prediction box is used to select a previously defined grid, which will only be part of the grid drop down menu if it is assigned the class "grid" or we can type, in this case 50, in for the number of x grid points and another number, in this case 80, in for number of y grid points. Click OK and the new network object will be created. By leaving all of these boxes blank the ranges based on the network locations wil be used.


Plotting the Network

First click on the third toolbar button, or network plotter from the DI drop down menu. To plot the network, we have the option of plotting the standard errors of prediction as an image, with contours, or both. We can also choose to just plot the network locations without the standard errors of prediction. Our example is small enough that we can plot everything, but if you have many locations you might prefer a less busy plot; especially for using the edit network feature. We choose NC.net from the Network Object drop down menu and check all the options as on the right.

Our plot should look like the image shown at right. For this example, it should not take it very long to plot, but if you have a large dataset, you can speed things up by not plotting the county boundaries.


Plotting Probabilities of Exceedence

As an added feature of DI, we can create a plot of the probabilities that measurements on a grid exceed a certain value using Kriging. We can also plot the means, standard deviations and even the diagnostic plots created from Kriging.

Simply click on the fourth toolbar button to bring up the probability plot dialog.

Data
Here we will select our NC.net network object created above. For the date, we chose August 5, 1998 because it had some values in exceedence of 85 ppb for ozone content.

Plots
We can also select which plots we want to see; here select "all" to see all of them.

Advanced
The page under the Advanced tab provides some choices for the Fields function, Krig.

The option, "all", for the type of plot gives 5 different pages (you may have to re-size your plot as I have done in order to click on the different pages) to see different plots. For example, page 2 shows a larger plot of the means found from Kriging (as shown below).





Edit the Network

To edit the network, we must first plot the network using the third toolbar button.

Next, we click on the edit network toolbar button (fifth button). Take care to move the edit network dialog off of the plot or else it will cover parts of the graph. There are only a few choices with this dialog. The first box is the name of the new network object you wish to create. Note that edit network creates a new network object based entirely on the old one except with a new set of locations. We will call the new network object NC.net2 and we must select NC.net from the Network Object drop down menu. The final option does not need to be used. It may be changed if there is difficulty clicking on points to remove or add them.



After pressing OK, we can add points by left clicking on a part of the graph that does not have an existing point. To delete a point, simply left click on the existing point. To exit the edit network function simply right click anywhere on the graph. The appearance of the graph will change after pressing OK on the edit network dialog. It will look more like the one below. Note that the orange circle represents a point that I have added to the network and the white circle is a new point that has been added (at approx. (-78.9, 35.8)) and the white circle is a point that has been deleted from the network (at approx. (-78.7, 35.8)).



Once I have edited the network, I can plot the new network (along with the new standard errors of prediction if I choose) by clicking on the third toolbar button and selecting NC.net2 from the network object drop down menu.



Basic Data set



        To use DI, we must first create the covariance and network S-PLUS objects that we eluded to above. This will facilitate our use of the DI package. The covariance object includes: the name of an S-PLUS covariance function, optional mean and standard deviation objects and a list of arguments used by the particular covariance function selected. The network object includes information such as the locations for the network, the covariance object mentioned above, a grid and the name of a measurements matrix--where each row of the measurements matrix corresponds to a vector of data measured at each of the locations--used for plotting probabilities of exceedence.

The covariance object NC.cov.obj and the network object NC.net are included with DI as examples. Below we show how they were created. This particular example is substantive, based on daily ozone measurements and the details of the data analysis leading to the covariance model is detailed in the second section of this manual. However, in order to get started using DI, it is not necessary to appreciate the details of indentifying the covariance function.

8-hour average daily ozone for the Eastern U.S.

The  DI Data set ozmax8  consists of 5  "ozone seasons" ( May 1 - October 31 1995-1999) of  max  8 hour ozone readings at 513 mointoring stations in the Eastern US.  It is formatted as a  920X513 matrix  ( rows= time, columns=stations) missing values have the code NA.

ozmax8.info has components:
  stat.no  station id  numbers
   loc  513X2 matrix of longitude and latitude of the stations ( by convention longitudes are negative west of  the prime meridian)
    dates   920 numerical vector of the time in Julian  days where 1/1/1960 is  day 1, and gives the additional information for a spatial analysis.
 
 

Given below are some simple summaries of these data that use FIELDS functions

#map of all stations in data set ozmax8
US()
points(ozmax8.info$loc)
#

# image plot of station data for the first day
good<- !is.na( ozmax8[1,])
as.image( ozmax8[1,good], x= ozmax8.info$loc[good,], nrow=100, ncol=100)-> out.p
image.plot( out.p)
US( add=T)

# mean and sd's (this takes a while)
stats( ozmax8)-> stats.ozmax8
set.panel( 2,1)
as.image( stats.ozmax8[2,], x= ozmax8.info$loc, nrow=100, ncol=100)-> out.p
image.plot( out.p)
US( add=T)
title("Mean levels")

as.image( stats.ozmax8[3,], x= ozmax8.info$loc, nrow=100, ncol=100)-> out.p
image.plot( out.p)
US( add=T)
title( "Sd's for station data")

North Carolina Subset

To simplify working with these data we will consider a subset of stations centered around North  Carolina.  For completeness we include  the S commands  that create the example data set NC.ex  However, this data set has been included as part of  DI .  It is possible to work with the full netowork but beyond about  200 locations the response will be slow. 

#create NC subset and data set #

NCsubset<- ozmax8.info$lon > -82& ozmax8.info$lon< -77
& ozmax8.info$lat< 39 & ozmax8.info$lat> 33

NC.ex<- list( ozmax8= ozmax8[, subset], loc=ozmax8.info$loc[NCsubset,],
dates= ozmax8.info$dates)

# create a measurements matrix for later use with DI.
NC.ozmax8 <- NC.ex$ozmax8

row.names( NC.ozmax8) <- dates( as.numeric( NC.ex$dates))

# plot up the locations with county and state boundaries
par( pty="s")
plot(NC.ex$loc)
US( add=T, database="county")
US( add=T, col= 2, lwd=2)
 
 

Summary Statistics for  NC.ex

The next step is to identify a covariance model for this subset. To get started we consider a simple time stationary,  correlation model. It is assumed that the station measurements when standardized by their mean and standard deivation follow an isotropic covariance function.

# find sample correlations
COR( NC.ex$ozmax8)-> out

The components returned by COR consist of the means  (mean) and standard deviations (sd)  a matrix of the pairwise correlations ( cor) and a matrix of the number of nonmissing observations for station pairs (N).
A thin plate spline is used to fit a surface to the means and standard deviations.  These are needed to extend the covariance model to locations where ozone is not measured.  In this fitting the splines smooth the data based on a cross validation criterion but it may make sense to use other choices for the amount of smoothing.

# surface for means
Tps( NC.ex$loc, out$mean)-> NC.mean

# surface for standard deviations
Tps( NC.ex$loc, out$sd)-> NC.sd

# make plots of the fitted surfaces

set.panel(2,1)

surface( NC.mean, type="C")
US( add=T)
points( NC.mean$x)
title("means")

surface( NC.sd, type="C")
US( add=T)
points( NC.mean$x)
title( "sd's")
 

The next part is to indentify a model for the correlations and we  make a plot of  station correlations as a function of  the distance of separation.  The simplest model is fitting a function to the correlations as a function of separation distance.
and this is done below.

# pairwise distances
rdist.earth( NC.ex$loc)-> DD
# indicator to extract upper triangle of the matrix  and distances
ind<- col(DD)> row(DD)

# create a list with the information we need and plot the values

cgram.oz<- list( d=DD[ind], cor=out$cor[ind])
set.panel(2,1)
plot( cgram.oz$d, cgram.oz$cor, pch=".")
# conditional box plots are better!
bplot.xy(cgram.oz$d, cgram.oz$cor,N=15)

# fit an exponential correlation function using nonlinear least squares.

cgram.fit1<- nls( cor ~ alpha* exp( -d/theta), cgram.oz, start= list( alpha=.95, theta=400))

# try a mixture  of two exponentials

cgram.fit2<- nls( cor ~ alpha1* exp( -d/theta1) + alpha2* exp( -d/theta2), cgram.oz, start= list( alpha1=.5, alpha2=.5, theta1=100, theta2=400))

#take a look at the fits
set.panel()
bplot.xy(cgram.oz$d, cgram.oz$cor,N=15)
points( cgram.oz$d, cgram.oz$cor, pch=".", col=2)

seq( 0,400,,200)-> xt
yt<- predict( cgram.fit1, list( d=xt))
lines( xt,  yt, col=4)
yt<- predict( cgram.fit2, list( d=xt))
lines( xt,  yt, col=5)
 

Based on the fitted curves there is some underestimate of the correlation for close stations with a single exponential model.  It appears that a mixture of two fits better.  An alternative would be to use  a more flexible covariane such as the Matern class.
Note  that there may be some problems with underestimating  correlations beyond 200 miles.
However we use the mixture model and give it a more descriptive name:

NC.cor.fit<-cgram.fit2

Having identified a covariance model  we now need to identify what function to choose in DI and also what arguments to specify.  As part of DI is a covariance function that is suited for this model CM.cov ( standing for Correlation  Model).
CM.cov  will use the following  information when creating a covaraince object.  For defining the marginal standard deviation use NC.sd for the correlation function use NC.cor.fit and finally the lon.lat argument should be  checked (set to TRUE).



Use cover.design to thin a network

        As a companion to the DI/Fields software it is useful to have an algorithm that produces a set of points that are evenly distributed. A typical application is to choose a subset of observed locations to use as the knots for a reduced set of basis functions. This subset should be evenly distributed over the observed domain of data. The function cover.design is a function to accomplish this. The basic framework is to determine a subset of design points from a larger set of candidate points. The candidate points not only serve as possible design points but determine the coverage criterion. The reader is referred to the Fields manual for more information on this function and the swapping algorithm.

Here is an example of how cover.design works.

design200 <- cover.design.nw( R=ozmax8.info$loc)
plot( design200, xlab="Lon", ylab="Lat")
map( add=T)
summary( design200)

        The component design200$design is the "best" design and design200$best.id gives the row numbers of the candidate set matrix that comprise the "best" design. We will use this to add 50 points for a total of 250 points.

design250 <- cover.design( ozmax8.info$loc, fixed=design250$best.id)
plot( design250, xlab="Lon", ylab="Lat")
map( add=T)