![]() |
|
CISL | IMAGe | Statistics | Contact Us | Visit Us | People Search |
The basic function of DI is to quantify the accuracy of spatial predictions to points where data is not collected, and we use ozone air quality measurements for the tutorial. Although this software addresses a general problem in spatial statistics it use is targeted at the evaluation of monitoring networks for air quality. For this reason through this manual we will refer to the spatial locations where data is collected as a "network".
DI is a library of functions (written in the S language) and datasets added to the S-PLUS statistical package. Some of these functions create the GUI boxes used to setup plots while others compute the necessary statistics. The functioning of DI is greatly simplifed because it is added on top of a spatial statistics library, Fields, which is also designed for S-PLUS. Although Fields does not provide a graphical interface, many of the statistical calculations done in DI use the basic Fields functions. Part of DI's usefulness is that an experienced S-PLUS user can do some spatial analysis using command-line functions and then present the final results using the DI GUI functions. At the S-PLUS command line one has all the functionality of the Fields package. This manual gives some examples of analyzing monitoring networks using DI alone but also gives some examples where the network information and spatial models are set up using Fields and S-PLUS functions in the commmand window and then DI is used to display the results.
DI has a simple structure centered on two kinds of activities. It provides an easy way to 1) setup a spatial prediction problem and then to 2) analyze the prediction accuracy. The spatial problem requires knowledge of the spatial covariance function and the locations where measurements are made. Once these are specified, the predictions take the form of the usual best linear unbaised estimates (BLUE) or Kriging estimates.
Throughout this introduction we will refer to some S-PLUS datasets as objects. Objects are just lists in S-PLUS that have specific components and are given a name (class). Their benefit is that they bundle different kinds of data together into one unit. The two main objects used in DI are a covariance object and a network object.
Briefly, the covariance object is used to define a spatial covariance function for a given spatial prediction problem. At the minimum, the network object has the network locations and an associated covariance object. The functions in DI are used to create or modify a network object or are used to examine the accuracy when spatial predictions are made from observations on this network. The key idea is that all the information for Kriging for a network is contained in the associated network object.
Before going through the DI buttons, here are useful commands for starting and exiting DI.
S-PLUS command | action |
---|---|
library(di) |
# loads the DI library |
di.close() |
# exits DI |
di.init() |
# restarts DI (requires that the DI library is still attached) |
Note: When opening the DI library, a graph sheet will appear. This graph sheet will enable you to use other color schemes. If you close this graph sheet, you can obtain a new one with the graphsheet command or use the DI function refresh.graphsheet to obtain a new graphsheet with the default color scheme.
DI enables us to plot and edit network configurations and allows us to overlay a U.S. map (with county boundaries if desired) along with an image and/or contour plot of the standard errors of prediction for the network. To plot such a network, simply click on the "Network Plot" button to bring up the "Network Plots" dialog box as seen below.
Here we use NC.ex.net
, which is an example network object
that is included with the DI package. For this example, we have
selected the image plot for standard errors of prediction as well as
the county boundaries. The result is shown below.
The above standard errors of prediction are obtained from kriging the network locations. Kriging is a technique for evaluating data that is spatially dependent. It is a regression fit where, in this case, we use the network locations to predict the value at the locations, but instead of assuming independence between observations (locations) we assume that they are spatially dependent. It is, therefore, necessary to estimate a covariance function for the spatial dependence from the data, and then fit a generalized linear regression model using this covariance function. We use the krig model to predict observations on a grid and then obtain the standard errors of these predictions. It is then these standard errors that are plotted (using the image.plot function included with DI) above.
To see which covariance function is being used with our
NC.ex.net
example, simply type NC.ex.net$cov.obj
at the command prompt in the Commands window. It will display the
name of the covariance function and the parameters used.
To learn more about the DI covariance object
click here. To see an example of how to
choose a covariance function Click here.
Currently, there are 5 buttons in the toolbar. We summarize their functions below but the reader is referred to the following examples that give snapshots of the different popup windows produced by each button.
[1] Create/modify a covariance object
Gathers the information needed to describe the spatial covariance.
(Required) Name of new object (default is new.cov.obj).
(Optional) The name of the covariance function. Currently, DI has CM.cov,
and the exponential covariance function. Others can be created in S by
some simple programming.
Other options will depend on your choice of covariance function.
Some such options are listed.
(Optional) The range parameter is the scaling parameter used in the
covariance.
(Optional) Lon/Lat check box. If checked then the distance between
locations will be computed assuming the lcoations are in longitude and
latitude. The distances scale is in miles. So for example with this box
checked the range parameter should be in units of miles.
[2] Create/modify a network object
This object has the basic components:
(Required) Name of new object (default is new.network)
(Required) The network locations ( design) a 2 column matrix
(Optional) The grid points used to evaluate spatial predictions
(Required) Name of the covariance object created by function [1]
[3] Network plot
This will produce a plot of the network along with an optional image
and/or contour plot showing the standard error of prediction based on
the network observations.
(Required) Network object name. Note pull down menu to choose among
objects that have been created.
(Optional) Several check boxes that reflect options for the plot.
![]() To create the covariance object, click on the first toolbar button in the DI toolbars or click on create covariance object under DI in the drop down menu. We will call our covariance object NC.cov.obj, which we select by typing this in under New Covariance Object Name. We will use the correlation model by choosing CM.cov under Type. Once we have chosen the type of covariance function we want to use we must click on Create Object to select the rest of the parameters. |
![]() |
After clicking on the Create Object button, a new dialog window should appear. This dialog window is specific to the type of covariance function selected--in our case, CM.cov. Regardless of the type of covariance function selected, we have the option of selecting mean and/or standard deviation objects. We will use NC.mean for Mean Object, NC.sd for Standard Deviation as created below. Since our data locations are given in longitude and latitude coordinates (using negative longitude) we must check Longitude and Latitude if it is not already checked. Finally, since we chose CM.cov we must give it some kind of regression fit. We will use the fit NC.cor.fit, which we found below. For fit we can use any S-PLUS function that gives the correlation between two locations as a function of their distance of separation. | ![]() |
Once all the necessary parameters have been selected, click OK. You will then be back to the first dialog box. If you wish to create another covariance object, then click on apply to save the object you just created, then type a new name into the New Covariance Object Name box and create your new covariance object as above. When you are finished creating covariance objects click OK on the "Create a new covariance object" dialog. |
![]() To create the network object click on the second toolbar button of the DI toolbar buttons or Make Network Object from the DI drop-down menu.
Return Value
Design |
![]() |
Covariance Object We will select the covariance object that we just made above, NC.cov.obj--this should be listed as an option in the Covariance Object drop down menu. Note that we must create the covariance object before we create the network object.
Grid |
![]() First click on the third toolbar button, or network plotter from the DI drop down menu. To plot the network, we have the option of plotting the standard errors of prediction as an image, with contours, or both. We can also choose to just plot the network locations without the standard errors of prediction. Our example is small enough that we can plot everything, but if you have many locations you might prefer a less busy plot; especially for using the edit network feature. We choose NC.net from the Network Object drop down menu and check all the options as on the right. |
![]() |
Our plot should look like the image shown at right. For this example, it should not take it very long to plot, but if you have a large dataset, you can speed things up by not plotting the county boundaries. | ![]() |
![]() As an added feature of DI, we can create a plot of the probabilities that measurements on a grid exceed a certain value using Kriging. We can also plot the means, standard deviations and even the diagnostic plots created from Kriging. Simply click on the fourth toolbar button to bring up the probability plot dialog.
Data |
![]() |
Plots We can also select which plots we want to see; here select "all" to see all of them.
Advanced |
The option, "all", for the type of plot gives 5 different pages (you may have
to re-size your plot as I have done in order to click on the different pages)
to see different plots. For example, page 2 shows a larger plot of the means
found from Kriging (as shown below).
Once I have edited the network, I can plot the new network (along with the new standard errors of prediction if I choose) by clicking on the third toolbar button and selecting NC.net2 from the network object drop down menu. |
![]() ![]() |
The covariance object NC.cov.obj and the network object
NC.net are included with DI as examples. Below we show how they
were created. This particular example is substantive, based on daily ozone
measurements and the details of the data analysis leading to the
covariance model is detailed in the second section of this manual.
However, in order to get started using DI, it is not necessary to
appreciate the details of indentifying the covariance function.
8-hour average daily ozone for the Eastern U.S.
The DI Data set ozmax8 consists of 5
"ozone seasons" ( May 1 - October 31 1995-1999) of max 8 hour
ozone readings at 513 mointoring stations in the Eastern US. It is
formatted as a 920X513 matrix ( rows= time, columns=stations)
missing values have the code NA.
ozmax8.info has components:
stat.no station id numbers
loc 513X2 matrix of longitude and latitude
of the stations ( by convention longitudes are negative west of the
prime meridian)
dates 920 numerical vector
of the time in Julian days where 1/1/1960 is day 1, and gives the
additional information for a spatial analysis.
Given below are some simple summaries of these data that use FIELDS functions
#map of all stations in data set ozmax8
US()
points(ozmax8.info$loc)
#
# image plot of station data for the first day
good<- !is.na( ozmax8[1,])
as.image( ozmax8[1,good], x= ozmax8.info$loc[good,],
nrow=100, ncol=100)-> out.p
image.plot( out.p)
US( add=T)
# mean and sd's (this takes a while)
stats( ozmax8)-> stats.ozmax8
set.panel( 2,1)
as.image( stats.ozmax8[2,], x= ozmax8.info$loc,
nrow=100, ncol=100)-> out.p
image.plot( out.p)
US( add=T)
title("Mean levels")
as.image( stats.ozmax8[3,], x= ozmax8.info$loc,
nrow=100, ncol=100)-> out.p
North Carolina Subset
image.plot( out.p)
US( add=T)
title( "Sd's for station data")
To simplify working with these data we will consider a subset of stations centered around North Carolina. For completeness we include the S commands that create the example data set NC.ex However, this data set has been included as part of DI . It is possible to work with the full netowork but beyond about 200 locations the response will be slow.
#create NC subset and data set #
NCsubset<- ozmax8.info$lon > -82& ozmax8.info$lon<
-77
& ozmax8.info$lat< 39 & ozmax8.info$lat>
33
NC.ex<- list( ozmax8= ozmax8[, subset],
loc=ozmax8.info$loc[NCsubset,],
dates= ozmax8.info$dates)
# create a measurements matrix for later use with DI.
NC.ozmax8 <- NC.ex$ozmax8
row.names( NC.ozmax8) <- dates( as.numeric( NC.ex$dates))
# plot up the locations with county and state boundaries
par( pty="s")
plot(NC.ex$loc)
US( add=T, database="county")
US( add=T, col= 2, lwd=2)
Summary Statistics for NC.ex
The next step is to identify a covariance model for this subset. To get started we consider a simple time stationary, correlation model. It is assumed that the station measurements when standardized by their mean and standard deivation follow an isotropic covariance function.
# find sample correlations
COR( NC.ex$ozmax8)-> out
The components returned by COR consist of the means (mean) and
standard deviations (sd) a matrix of the pairwise correlations ( cor)
and a matrix of the number of nonmissing observations for station pairs
(N).
A thin plate spline is used to fit a surface to the means and standard
deviations. These are needed to extend the covariance model to locations
where ozone is not measured. In this fitting the splines smooth the
data based on a cross validation criterion but it may make sense to use
other choices for the amount of smoothing.
# surface for means
Tps( NC.ex$loc, out$mean)-> NC.mean
# surface for standard deviations
Tps( NC.ex$loc, out$sd)-> NC.sd
# make plots of the fitted surfaces
set.panel(2,1)
surface( NC.mean, type="C")
US( add=T)
points( NC.mean$x)
title("means")
surface( NC.sd, type="C")
US( add=T)
points( NC.mean$x)
title( "sd's")
The next part is to indentify a model for the correlations and we
make a plot of station correlations as a function of the distance
of separation. The simplest model is fitting a function to the
correlations as a function of separation distance.
and this is done below.
# pairwise distances
rdist.earth( NC.ex$loc)-> DD
# indicator to extract upper triangle of the matrix and distances
ind<- col(DD)> row(DD)
# create a list with the information we need and plot the values
cgram.oz<- list( d=DD[ind], cor=out$cor[ind])
set.panel(2,1)
plot( cgram.oz$d, cgram.oz$cor, pch=".")
# conditional box plots are better!
bplot.xy(cgram.oz$d, cgram.oz$cor,N=15)
# fit an exponential correlation function using nonlinear least squares.
cgram.fit1<- nls( cor ~ alpha* exp( -d/theta), cgram.oz, start= list( alpha=.95, theta=400))
# try a mixture of two exponentials
cgram.fit2<- nls( cor ~ alpha1* exp( -d/theta1) + alpha2* exp( -d/theta2), cgram.oz, start= list( alpha1=.5, alpha2=.5, theta1=100, theta2=400))
#take a look at the fits
set.panel()
bplot.xy(cgram.oz$d, cgram.oz$cor,N=15)
points( cgram.oz$d, cgram.oz$cor, pch=".",
col=2)
seq( 0,400,,200)-> xt
yt<- predict( cgram.fit1, list( d=xt))
lines( xt, yt, col=4)
yt<- predict( cgram.fit2, list( d=xt))
lines( xt, yt, col=5)
Based on the fitted curves there is some underestimate of the correlation
for close stations with a single exponential model. It appears that
a mixture of two fits better. An alternative would be to use
a more flexible covariane such as the Matern class.
Note that there may be some problems with underestimating
correlations beyond 200 miles.
However we use the mixture model and give it a more descriptive name:
NC.cor.fit<-cgram.fit2
Having identified a covariance model we now need to identify what
function to choose in DI and also what arguments to specify. As part
of DI is a covariance function that is suited for this model CM.cov ( standing
for Correlation Model).
CM.cov will use the following information when
creating a covaraince object. For defining the marginal standard
deviation use NC.sd for the correlation function use NC.cor.fit
and finally the lon.lat argument should be checked (set to TRUE).
Use cover.design to thin a network
As a companion to the DI/Fields software it is useful to have an algorithm
that produces a set of points that are evenly distributed. A typical
application is to choose a subset of observed locations to use as the knots for
a reduced set of basis functions. This subset should be evenly distributed
over the observed domain of data. The function cover.design is a
function to
accomplish this. The basic framework is to determine a subset of design points
from a larger set of candidate points. The candidate points not only serve as
possible design points but determine the coverage criterion. The reader is
referred to the Fields manual for more
information on this function and the swapping algorithm.
Here is an example of how cover.design works.
design200 <- cover.design.nw( R=ozmax8.info$loc)
plot( design200, xlab="Lon", ylab="Lat")
map( add=T)
summary( design200)
The component design200$design is the "best" design and
design200$best.id gives the row numbers of the candidate set matrix
that comprise the "best" design. We will use this to add 50 points for a
total of 250 points.
design250 <- cover.design( ozmax8.info$loc, fixed=design250$best.id)
plot( design250, xlab="Lon", ylab="Lat")
map( add=T)