Distribution and Statistics in Climate Data

Temperature and precipitation are important variables among basic mesurements of most observation stations. We will do some analysis for these variables around Carolina.

Load data



US(add=T, col='black',lwd=5)
title( paste('Maximum temperature on', time[1] ) )

US(add=T, col='black',lwd=5)
title(paste('Precipiation on', time[1])  )

Let’s look at the time series of the precipitation and mean temperature in a station around Raleigh during June, July and August from 1970 to 2011.

plot(raleigh$time, raleigh$tmax,
     ylab='degrees C', xlab='time', type="h")

plot(raleigh$time, raleigh$prcp/25.4,
     ylab='inches', xlab='time', type="h")

Some useful statistics

To make this easier we will omit the missing values from each of the data sets and for precipitation omit the days of no rain and also convert from mm to inches

YM<- na.omit( raleigh$tmean)
YP<- na.omit( raleigh$prcp)
YP<- YP/25.4 # convert to inches 
ind<- YP >0
YP<- YP[ ind]

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.10   23.85   25.65   25.38   27.20   34.05
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01181 0.09842 0.27953 0.47947 0.61024 5.37008


## plot window will lay out plots in a 1 by 2 matrix
hist(YP,nclass = 30)

Choosing the right number of classes can help figure out the shape. clearer. Also you can see how the quantiles reflect the different shapes.

superimposing a theoretical distribution.

It is useful to represent the distribution in the histogram with a continuous curve. These typically have one or more parameters that are used to match to the data. Here we are experimenting with adding a normal distribution to the temperatures. The normal is controled by the location of its peak and the peak width.

### some examples of normals 
xgrid<- seq( 10,40,length.out=200)
y1<- dnorm( xgrid, mean= 25, sd = 2 )
y2<- dnorm( xgrid, mean= 25, sd = 4 )
y3<- dnorm( xgrid, mean= 18, sd = 2 )
plot( xgrid,  y1, col="orange2", type="l")
lines( xgrid, y2,col="blue2")
lines(xgrid, y3,col="black")

# the theoretical 95% quantiles for these distributions
look<- c(qnorm( .95, mean= 25, sd=2),
qnorm( .95, mean= 25, sd=4),
qnorm( .95, mean= 18, sd=2)
print( look)
## [1] 28.28971 31.57941 21.28971
xline( look) # add to plot

xgrid<- seq( 10,40,length.out=200)
y<- dnorm( xgrid, mean= 25, sd = 2 )
hist(YM,nclass=30, prob = TRUE, ylim =c(0,.2))
lines( xgrid, y, col="green4")

An exponential for the precipitation.

xgrid<- seq( 0,6,length.out=200)
y<- dexp( xgrid, rate = 1.5 )
hist(YP,nclass=30, prob = TRUE)
lines( xgrid, y, col="green4")

Comparing the fit with quantiles

The theoretical distribution imply quantiles and it make sense to compare those values to the ones in the data.

q95Normal<- qnorm( .95, mean= 20, sd=2)
# compare to 
quantile( YM, .95)
##   95% 
## 29.45
# in this case not too good. 


The idea is compare all the quantiles of the data to the theoretical quantiles of the distribution. It is a perfect match if they are on a line with intercept of 0 and a slope of 1.

N<- length( YM)
pSequence<- (0:(N-1) +.5)/N
qNormal<- qnorm( pSequence, mean= 25, sd=2)
plot( qNormal, sort( YM), xlab="Theoretical", ylab="Data",
abline( 0,1, col="red")
# use the estimates from the data
qNormal2<- qnorm( pSequence, mean= mean(YM), sd=sd( YM))
points(qNormal2, sort( YM), col="orange2" )
abline( 0,1, col="red")

Q-Q plot for the exponential

N<- length( YP)
pSequence<- (0:(N-1) +.5)/N
qExp<- qexp( pSequence, rate=1.5)
plot( qExp, sort( YP), xlab="Theoretical", ylab="Data",
abline( 0,1, col="red")
qExp2<- qexp( pSequence, rate=1/mean( YP))
points( qExp2, sort( YP), col="orange2")

Next, we try to draw maps for quantiles of these two variables.

tmean.95=apply(tmean,1,quantile,0.95, na.rm=TRUE)
quilt.plot(lonLatTemp,tmean.95,main='95% quantile of mean temperature',cex.main=2,zlim=c(20,32))
US(add=T, col='black',lwd=5)

Evaluating the likelihood function for the exponential distribution

# finding the MLE for the exponential 
1/mean( YP)
## [1] 2.085629
rate<- seq( 1.5, 2.5,length.out=100)
logLike<- rep( NA, 100)
for( k in 1:100){
  n<- length( YP)
  logLike[k]<- -sum(YP)*rate[k] + n*log( rate[k])
plot( rate, logLike, type="l")
# a range indicates 95% uncertainty
cut95<- qchisq(.95, df=1)/2
ind<- max(logLike) - logLike < cut95
yline( max(logLike) - cut95)
points( rate[ind], logLike[ind], col="red")
xline( min( rate[ind]))
xline( max( rate[ind]))