Use of reduced-rank covariance estimates for objective analyses of historical climate data sets
Alexey Kaplan
Lamont-Doherty Earth Observatory (LDEO) of Columbia University

Outline
Motivation: most tools of climate research (statistical analyses or model runs) need complete fields of the data, therefore least-squares estimates are usually used.
To introduce such estimates and review some of their crucial properties.
Fundamental interpretational and other outstanding problems that ensue and what to do about them.

Slide 3
Given a choice, climatologists in general would rather use the righthand panel below than the lefthand one
Slide 5
Slide 6
Slide 7
Slide 8
Slide 9
Example of Optimal Interpolation
T=TB+eB
HT=To+eo
<eB>=< eo >=< eBeoT>= 0
< eB eBT >=C       Hard to know in detail!
< eo eoT >=R
  Solution minimizes the cost function
   S[T]=(HT-To)TR-1(HT-To)+(T-TB)TC-1(T-TB)
             T=(HTR-1H+C-1) -1(HTR-1To+C-1TB)

Projection of OI solution on eigenvectors of C (EOFs)
C = EDET
T = Ea
For simplicity: H = I,  R = rI,   T := T-TB
Then   a = D(D+R)-1ETTo
D(D+R)-1 = diag[ di/(di+r) ],   ao=ETTo
Therefore  ai/ao = di/(di+r)
In many applications (for spectrally red signals) diagonal elements of this matrix decrease from ~1 to ~0. In effect, the solution is constrained to the subspace spanned by the patterns with di>>r.

"Spagetti-western properties of least-squares estimates..."
  Spagetti-western properties of least-squares estimates of spectrally red signals: (good) can be approximated by a few modes, (bad) have less variance than the true signal, and (ugly) redder than the true signal.

Slide 13
Slide 14
Slide 15
EOFs of SST (#1,2,3,15,80,120)
Slide 17
Slide 18
Slide 19
EOFs of zonal wind anomaly
Slide 21
    Independent ENSO indices
Slide 23
Slide 24
OUTSTANDING PROBLEMS
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Take home points
Spagetti-western properties of least-squares estimates of spectrally red signals: (good) can be approximated by a few modes, (bad) have less variance than the true signal, and (ugly) redder than the true signal.
These properties can be used for making analyses of sparse climate data cheaper and less ambiguous in their setup.
Since the effect of these properties is stronger for poor data, and the data quality generally improves with time, use of least-squares analyses at face value, as if they were the truth,  poses a threat of misinterpretation.
A possible way out (however expensive): use of ensembles drawn from the posterior distributions rather than a single ensemble mean.

EXTRAS FOR DISCUSSION
Slide 34
Slide 35
Slide 36
Slide 37
Slide 38
Slide 39
Slide 40
Slide 41
Slide 42
Slide 43
Slide 44