IMAGe Datasets

Observed and infilled monthly values for surface stations

Below are descriptions for the observed and infilled data sets that formed the basis for deriving the PRISM100 gridded data product. If you are just interested int eh observed data and not the infilled values see information on RData.USmonthlyMet.bin .

These are complete "data products" where missing station values have been filled in using spatial statistics. When using the complete data for further statistical analysis care should be taken with the infilled values. Although they are good estimates of the mean they may not reproduce the variability that one would expect from actual point observations of the meteorology. In statistical language, the infilled values are the mean of the conditional distribution for the measurement given the observed data. They are not samples from this conditional distribution.

Precipitation:

Data file format

Acquire the (100Mb) tar file: NCAR_pinfill_others.tar

In UNIX

tar -xvf NCAR_pinfill_others.tar

Will extract to the subdirectory NCAR_pinfill. Precipitation units are in total millimeters per month and the time span is 1895-1997. There are a total of 11918 station locations, thus each yearly file has 11918 lines.

Metadata on these stations is found in METAinfo. The first row gives the columns headings and subsequent rows have the information:
station code, longitude, latitude, elevation
where some of the station codes contain characters. The (244kb) text file USmonthly.names.txt is a table ( station code, place name) that can be used to find the geographic name of a station. Not all the precipitation stations are in this list however.

The complete precipitation files based on regular station data have the names ppt.complete.Ynnn where nnn = 001, 002, ..., 103 and 001=1895 and 103=1997.

Each separate data file consists of the precipitation for a single year. Each line of the file is data for one station according to the format: station id, 12 temps ( jan-dec), 12 missing value/infill codes (1=missing, 0=present) and is written with the FORTRAN statement format(a8,12I5,2x, 12I1). The stations appear in exactly the same order as in the metadata file.

Statistical methodology for infilling monthly precipitation: When a data value is missing a statistically infilled value appears and the statistical details of this process are given in the technical report: Johns, C., D., Nychka, T. Kittel, and C., Daly, 2001: Infilling Sparse Records of Spatial Fields Some details of the models and estimates are collected in Supplement to JASA article Finally, the entire analysis and infill process can be reproduced using Matlab, R and F77 programs and the interested researcher should contact Doug Nychka (nychka "at" ucar "dot' edu) for details. Archived volume for this project is 678MB.
NOAA related dataset and data product:The infilled precipitation and temperature records were subsequently used to create a fine (4km) gridded, publicly available data product: "103-Year High-Resolution Climate Data Set for the Conterminous United States " maintained and distributed by NOAA/NCDC. The FTP distribution for this final product along with supporting meta-data can be found at www1.ncdc.noaa.gov/pub/data/prism100.

Temperatures:

Acquire the (100Mb) tar file: NCAR_tinfill_others.tar

In UNIX

tar -xvf NCAR_tinfill_others.tar

Will extract to the subdirectory NCAR_tinfill

Metadata on these stations is found in METAinfo . Columns of this file are:
station code, elevation, longitude, latitude
(however elevation is not used in any of the infill procedures.) The stations for temperature may not be the same as those reporting precip. Do not be fooled, station ids contain some characters! The (244Kb) text file USmonthly.names.txt is a table ( station code, place name) that can be used to find the geographic name of a station. Not all the temperature stations may be listed.

There are a total of 8125 station locations. The data file names are of the form: tmax.complete.Ynnn and tmin.complete.Ynnn with nnn = 001, 002, ..., 103; and consist of the values for a particular year with 001=1895 and 103=1997. Temperature appears as a integer in tenths of degree C. So 73 should be interpreted as 7.3 degrees C or (9/5)* 7.3 + 32= 45.14 degrees F.

The format for each line of the data is the same as the description of the precipitation data set above including flags for infilled verses real data. The R code to read in a single year is the same as the sample file single.year.R for precip given above. To read the temperature files just change the "ppt" part of the file name to either "tmin" or "tmax".

Working with the data files in R

Getting a particular station

In UNIX in the directory NCAR_pinfill:

grep 010008 ppt.complete* > first.station.data
wc first.station.data
     103    1442   10403

grep 010148 tmax.complete* > first.station.tmax.data
wc first.station.data
     103   1442   10506

grep 010148 tmin.complete* > first.station.tmin.data
wc first.station.tmin.data
 103    1442   10506

This will have all the years for a station in the right order.

Reading a years data into R

To read in metadata:

temp<- read.table( "METAinfo")
# check out locations
plot( temp$lon, temp$lat, pch=".")

To read in a particular station source the R code in the file get.station.R

id<- '010008'
look<- get.station(id, with.infill=T, type="ppt")

To read in a particular year, yr and deal with missing obs. use the read.rfr function in R for a fixed format read. Here is an example that is used to create the fields example data set RMprecip. It assumes that you are in the directory NCAR_pinfill. This code can easily be modified to give all the months or different variables besides precip.

# 1963- 1894 = 63
read.fwf("ppt.complete.Y063",
       width= c(6,   7,rep(5,11), 3,rep(1,11))) -> dat
miss<- as.matrix( dat[,(1:12) +13])
dat<-  as.matrix(  dat[,(1:12) +1 ])
# extra points awarded if you convert these to logical and integer
# to save space!

scan("METAinfo", skip=1, what=list( "a", 1,1,1))-> look
names( look)<-c("station.id", "lon", "lat","elev")
ind<- (look$lon < -102) &  (look$lon > -112) &
              (look$lat < 55) &  (look$lat>35)

x<- cbind(look$lon[ind],look$lat[ind] )
dimnames( x) <- list( look$station.id[ind], c("lon", "lat"))

elev<- look$elev[ind]
y<- dat[ind,8] # column 8 is Aug.

ind2<- miss[ind,8]==0  # infill value ==1 real data vlaue ==0 

y<- y[ ind2]
x<- x[ind2,]
elev<- elev[ind2]
RMprecip<- list( x=x, elev=elev, y=y)

To create a complete time series use a "for" loop with the year file names and accumulate what you need ... a convenient time to get some coffee while this is running.

DISCLAIMER:

The data sets, software and related content in and linked to these pages are intended for scientific and mathematical research. The authors do not guarantee the correctness of the data, software or companion text. Please see the UCAR Terms of Use listed below.


UCAR \| NCAR \| CISL Our Institute Research Publications Software Data
Contact Us \| Visit Us \| UCAR People Search	Numerics\| Assimilation\| Turbulence\| Statistics