![]() |
|
Contact Us | Visit Us | UCAR People Search | Numerics| Assimilation| Turbulence| Statistics |
Below are descriptions for the observed and infilled data sets that formed the basis for deriving the PRISM100 gridded data product. If you are just interested int eh observed data and not the infilled values see information on RData.USmonthlyMet.bin .
These are complete "data products" where
missing station values have been filled in using spatial
statistics. When using the complete data for further statistical
analysis care should be taken with the infilled values. Although
they are good estimates of the mean they may not reproduce the
variability that one would expect from actual point observations
of the meteorology. In statistical language, the infilled values
are the mean of the conditional distribution for the measurement
given the observed data. They are not samples from this
conditional distribution.
Acquire the (100Mb) tar file:
NCAR_pinfill_others.tar
In UNIX
Metadata on these stations is found in METAinfo.
The first row gives the columns headings and subsequent rows have
the information:
The complete precipitation files based on regular station data have
the names ppt.complete.Ynnn where nnn =
001, 002, ..., 103 and 001=1895 and 103=1997.
Each separate data file consists of the precipitation for a
single year. Each line of the file is data for one station according
to the format: station id, 12 temps ( jan-dec), 12 missing
value/infill codes (1=missing, 0=present) and is written with the
FORTRAN statement format(a8,12I5,2x, 12I1).
The stations appear in exactly the same order as in the
metadata file.
Statistical methodology for infilling monthly precipitation:
When a data value is missing a statistically infilled value appears
and the statistical details of this process are given in the technical
report: Johns, C., D., Nychka, T. Kittel, and C., Daly, 2001:
Infilling Sparse Records of Spatial Fields
Some details of the models and estimates are collected in
Supplement to JASA article
Finally, the entire analysis and infill process can be reproduced using
Matlab, R and F77 programs and the interested researcher should contact
Doug Nychka (nychka "at" ucar "dot' edu) for details. Archived volume
for this project is 678MB.Precipitation:
Data file format
tar -xvf NCAR_pinfill_others.tar
Will extract to the subdirectory NCAR_pinfill.
Precipitation units are in total millimeters per month and the time
span is 1895-1997. There are a total of 11918 station locations,
thus each yearly file has 11918 lines.
station code, longitude, latitude, elevation
where some of the station codes contain characters.
The (244kb) text file USmonthly.names.txt
is a table ( station code, place name) that can be used to find
the geographic name of a station.
Not all the precipitation stations are in this list however.
NOAA related dataset and data product:The infilled
precipitation and temperature records were subsequently used to
create a fine (4km) gridded,
publicly available data product:
"103-Year High-Resolution
Climate Data Set for the Conterminous United States "
maintained and distributed by NOAA/NCDC. The FTP
distribution for this final product along with supporting meta-data
can be found at
www1.ncdc.noaa.gov/pub/data/prism100.
Temperatures:
Acquire the (100Mb) tar file:
NCAR_tinfill_others.tar
In UNIX
tar -xvf NCAR_tinfill_others.tarWill extract to the subdirectory NCAR_tinfill
Metadata on these stations is found in METAinfo . Columns
of this file are:
station code, elevation, longitude, latitude
(however elevation is not used in any of the infill procedures.) The
stations for temperature may not be the same as those
reporting precip. Do not be fooled, station ids contain some
characters!
The (244Kb) text file USmonthly.names.txt
is a table ( station code, place name) that can be used to find
the geographic name of a station. Not all the temperature
stations may be listed.
There are a total of 8125 station locations. The data file names are of the form: tmax.complete.Ynnn and tmin.complete.Ynnn with nnn = 001, 002, ..., 103; and consist of the values for a particular year with 001=1895 and 103=1997. Temperature appears as a integer in tenths of degree C. So 73 should be interpreted as 7.3 degrees C or (9/5)* 7.3 + 32= 45.14 degrees F.
The format for each line of the data is the same as the description
of the precipitation data set above including flags for infilled
verses real data. The R code to read in a single year is the same
as the sample file single.year.R for precip given
above. To read the temperature files just change the "ppt" part of the
file name to either "tmin"
or "tmax".
Working with the data files in R
Getting a particular station
In UNIX in the directory NCAR_pinfill:
grep 010008 ppt.complete* > first.station.data
wc first.station.data
103 1442 10403
grep 010148 tmax.complete* > first.station.tmax.data
wc first.station.data
103 1442 10506
grep 010148 tmin.complete* > first.station.tmin.data
wc first.station.tmin.data
103 1442 10506
This will have all the years for a station in the right order.
Reading a years data into R
To read in metadata:
temp<- read.table( "METAinfo")
# check out locations
plot( temp$lon, temp$lat, pch=".")
To read in a particular station source the R code in
the file get.station.R
id<- '010008' look<- get.station(id, with.infill=T, type="ppt")To read in a particular year, yr and deal with missing obs. use the read.rfr function in R for a fixed format read. Here is an example that is used to create the fields example data set RMprecip. It assumes that you are in the directory NCAR_pinfill. This code can easily be modified to give all the months or different variables besides precip.
# 1963- 1894 = 63 read.fwf("ppt.complete.Y063", width= c(6, 7,rep(5,11), 3,rep(1,11))) -> dat miss<- as.matrix( dat[,(1:12) +13]) dat<- as.matrix( dat[,(1:12) +1 ]) # extra points awarded if you convert these to logical and integer # to save space! scan("METAinfo", skip=1, what=list( "a", 1,1,1))-> look names( look)<-c("station.id", "lon", "lat","elev") ind<- (look$lon < -102) & (look$lon > -112) & (look$lat < 55) & (look$lat>35) x<- cbind(look$lon[ind],look$lat[ind] ) dimnames( x) <- list( look$station.id[ind], c("lon", "lat")) elev<- look$elev[ind] y<- dat[ind,8] # column 8 is Aug. ind2<- miss[ind,8]==0 # infill value ==1 real data vlaue ==0 y<- y[ ind2] x<- x[ind2,] elev<- elev[ind2] RMprecip<- list( x=x, elev=elev, y=y)
To create a complete time series use a "for" loop with the year
file names and accumulate what you need ... a convenient time to
get some coffee while this is running.
The data sets, software and related content in and linked to these pages are intended for scientific and mathematical research. The authors do not guarantee the correctness of the data, software or companion text. Please see the UCAR Terms of Use listed below.