The use of covariance matrices in dimension reduction for space-time data |
Ian Jolliffe | |
Universities of Exeter, Kent, Aberdeen | |
ian@sandloch.fsnet.co.uk | |
Outline of talk |
How is dimensionality reduced? Concentrate on principal component analysis (EOF analysis) which uses covariance or correlation matrices | ||
Definitions | ||
Implementation | ||
Interpretation | ||
Choices | ||
Simplification | ||
Extensions (if time permits) | ||
to two (or more) groups of variables | ||
to three or more modes | ||
Some concluding remarks |
PCA and EOFs |
Principal component analysis (Hotelling, 1933) | ||
Empirical orthogonal functions (Lorenz, 1956) | ||
Other names too | ||
Reduces dimensionality by finding linear combinations of a large set of variables that successively maximise variance | ||
Limitations | ||
Can be more difficult to interpret than using a subset of the original variables, but typically not for space-time data | ||
Linearity. Non-linear versions exist – not discussed here. | ||
Uses only covariances, not higher-order moments – see independent component analysis (ICA) | ||
PCA – some definitions, terminology |
If x is a vector of p variables, then the principal components (PCs) are linear combinations aT1x, aT2x, É aTpx | |
Although we can find p PCs, and sometimes the last few are useful (e.g in finding outliers), for dimension reduction purposes we usually only keep the first few | |
In the kth PC ak, the vector of coefficients or loadings, is chosen so that the variance of aTkx is maximised, subject to a normalisation constraint aTkak = 1, and subject to successive PCs being uncorrelated | |
Finding PCs/EOFs |
The optimisation problem which defines PCs turns out, like many in multivariate analysis, to be an eigenvalue problem | |
The variances of the PCs are eigenvalues of the covariance (or correlation) matrix of x, in descending order, and the vectors of coefficients ak are the corresponding eigenvectors | |
This is the usual way of finding PCs, though other algorithms exist e.g using the singular value decomposition of the column-centred data matrix | |
PCA in atmospheric science (geoscience) |
Most common format is Ôvariables = stations or gridpoints; observations = different timesÕ | |
Eigenvectors are known as empirical orthogonal functions (EOFs) and the technique as EOF analysis | |
Elements of the EOFs are often plotted as contours on a map | |
Note that the EOFs are vectors of loadings in PCA, not the PCs themselves, which are time series in this context | |
Example – northern hemisphere sea level pressure (NH SLP) |
The data are monthly mean SLP for winter, from 1948 to 2000, on a 2.5¡ x 2.5¡ grid for the NH north of 20¡N from the NCEP/NCAR reanalysis | |
Some preprocessing has taken place – removing annual cycle - and area weighting based on the square root of the cosine of latitude has been used, but these details need not concern us here |
Slide 8 |
Example 2 – NH 850hPa streamfunction |
The data are monthly mean streamfunction for extended winter (includes March), from 1979 to 1997, for the whole NH from the ECMWF reanalysis (ERA) | |
Some preprocessing has taken place – removing annual cycle | |
EOFs are often interpreted as Ôphysical modesÕ and there is considerable argument over which EOFs correspond to such modes, and whether EOFs can find them, or even whether they exist |
Slide 10 |
Choices in PCA |
There are a number of decisions to be made in PCA | ||
Covariances or correlations? | ||
How many PCs/EOFs? | ||
Which normalisation constraint? |
Covariance or correlation |
PCs and their variances may be found by calculating eigenvectors and eigenvectors of either (a) a covariance matrix or (b) a correlation matrix | |
Corresponds to successively maximising variances of linear combinations of the raw variables | |
Corresponds to standardising each variable to have unit variance before the successive maximisation | |
NOTE: there is no simple relationship between the PCs found from the two types of matrix | |
Covariance or correlation II |
It is important to use correlations, not covariances, if variables are measured in different units (pressure, temperature) to avoid effects of arbitrary scaling | |
Most geoscience applications only use one type of variable – choice between covariance and correlation depends whether it is desirable for all variables (spatial locations) to have the same weight or allow those with greater variances to have an increased chance of dominating the first few PCs | |
How many PCs/EOFs? |
There are numerous rules (see Jolliffe, 2002a, Chapter 6) based on: | ||
Size of individual variances (eigenvalues, λk ) | ||
Cumulative sum of variances | ||
Changes in successive variances | ||
Physical interpretability | ||
More complicated techniques | ||
Types 2 & 4 are probably most often used in geoscience, but rules of type 5 have been suggested and 3 (gaps between eigenvalues) is also sometimes important (e.g if EOFs are to be rotated) | ||
Choice of normalisation constraint |
In the kth PC ak, the vector of coefficients or loadings, is chosen so that the variance of aTkx is maximised, subject to a normalisation constraint aTkak = 1, and subject to successive PCs being uncorrelated | |
The results presented may have aTkak = 1, but alternatives are aTkak = λk or aTkak = 1/ λk, where λk is the eigenvalue (variance) associated with the kth PC | |
In interpreting what a PC represents in terms of the original variables, the normalisation is unimportant – the maps look exactly the same. It is the relative values of the akj within ak that are important. | |
However, there are differences in interpretation of individual loadings which need not concern us here | |
Simplification |
PCs can be difficult to interpret, though often less so for space-time data than other types of data. To aid interpretation, various simplification techniques have been proposed. |
Simplification II |
Rotation (orthogonal or oblique) | ||
Restriction of loadings to discrete set of values | ||
LASSO-based approach | ||
Others | ||
Combining variance maximisation and simplification criteria | ||
Truncation of loadings | ||
Empirical orthogonal teleconnections | ||
etc. |
Rotation |
Well-known and widely-used but controversial (Richman, 1986, 1987; Jolliffe, 1987,1995; Mestas-Nu–ez 2000). Among questions to be addressed are | ||
Orthogonal or oblique | ||
Choice of simplicity criterion e.g. varimax | ||
How many EOFs to rotate | ||
Choice of normalisation constraint |
Example of rotation– USA summer precipitation |
402 stations (variables) | |
1312 times (observations) = 41 (3-day periods in May-Aug) x 32 (years) |
Slide 20 |
Other Simplification Methods |
We give no details here, but show the results of applying one of them (LASSO-based) to the earlier NH SLP example |
Slide 22 |
Slide 23 |
Relationships between variables in two (or more) groups |
We may wish to relate two sets of variables e.g. sea surface temperatures and mean sea level pressure. A variety of techniques is available | ||
Canonical correlation analysis | ||
Maximum covariance analysis (SVD) | ||
Many others | ||
Canonical correlation analysis (CCA) |
To find relationships between two groups of variables, find pairs of linear functions of variables, one from each group, that have maximum correlation, subject to being uncorrelated with previously found pairs | |
Turns out to be another eigenvalue problem, involving covariance matrices between (Sxy) and within (Sxx, Syy) groups of variables | |
Solve SxySyy-1Syxakx=lkSxxakx (akx = vector of loadings for x variables – similar equation for y variables) |
Maximum covariance analysis |
Also | ||
inter-battery factor analysis (Tucker, 1958) | ||
SVD (Bretherton et al., 1992) | ||
Similar to CCA except | ||
It successively maximises covariance rather than correlation | ||
Vectors of loadings are orthogonal, rather than derived variables uncorrelated | ||
Solves SxySyxakx = lkakx |
Maximum covariance analysis: Pacific SST vs. Hemispheric 500mb height ( Wallace et al., 1992) |
Extensions to 3 (or more) modes |
By ÔmodesÕ here I mean ÔtimeÕ, ÔspaceÕ –extras might be different climate variables, different levels in the atmosphere. Some extensions: | ||
O-mode, P-mode, É, T-mode analyses | ||
Extended EOF analysis | ||
Three-mode PCA |
O-mode, P-mode, É, T-mode |
Not really an extension – given 3 modes, most often space, time, climate variable, choose one as ÔvariableÕ, one as ÔobservationÕ, ignore the third, and do PCA. 6 possibilities. | |
S-mode most usual: space = variables, time = observations. | |
T-mode, not uncommon: time = variables, space = observations. | |
Other 4 used occasionally |
Extended EOF analysis |
n times, s spatial locations, p climate variables. Combine locations and variables to give (n x sp) data matrix and carry out EOF analysis on it. | |
Can also incorporate different time lags to give multivariate EEOF (MEEOF) analysis (Mote et al., 2000). | |
The latter also extends MSSA (Plaut & Vautard, 1994). |
MEEOF example – 5 variables averaged over 0-10¡ S for various longitudes |
200mb velocity potential | |
Outgoing radiation | |
215mb water vapour | |
100mb temperature | |
100mb water vapour |
Concluding remarks |
All the techniques discussed have an underlying objective of dimension reduction and all use covariance or correlation matrices. There is often a desire to physically interpret the new dimensions – this can be controversial | ||
For example, oblique rotation is advocated by some because Ôphysical modesÕ (NB different meaning of ÔmodeÕ) are often correlated. Other techniques are advocated (independent component analysis – ICA; Aires et al., 2000) on the premise that ÔmodesÕ are not just uncorrelated, but independent |
More concluding remarks |
We have mentioned some EOF-related techniques very briefly and others not at all | |
A large missing class consists of techniques explicitly designed for time series data e. g. SSA (Golyandina et al., 2001), MSSA (Plaut & Vautard, 1994), POP analysis (von Storch et al., 1988), MTM-SVD (Mann & Park 1999),É | |
Why not consult Jolliffe (2002a) for further details and references? |
Slide 34 |
Discrete set of values for loadings |
Hausmann (1982): -1, 0, +1 | |
Vines (2000), Jolliffe, Uddin & Vines (2002): more integers – gives so- called simple components | |
Chipman & Gu (2004): find ordinary EOFs, then truncate to –1, 0, +1 or –c1, 0, c2 | |
Rousson & Gasser (2004): a technique that produces blocks of zeros and blocks of equal non-zero loadings |
LASSO-based approach |
LASSO (Least Absolute Shrinkage and Selection Operator) developed in multiple regression to deal with multicollinearity. | |
A compromise between variable selection and biased regression. Shrinks some regression coefficients exactly to zero. | |
Adaptation to PCA: to the usual optimisation problem add an extra constraint (Jolliffe, Trendafilov & Uddin, 2003). |
LASSO II |
Constraint is | |
where ajk is the jth element in the kth EOF, and t is a Ôtuning parameterÕ. As t ¨ 0, an increasing number of loadings are driven to 0. | |
The technique is named SCoTLASS (Simplified Component Technique – LASSO). | |
Zou et al. (2006) have an ÔimprovedÕ version of SCoTLASS, with an implementation in R |
Mediterranean sea surface temperature example |
16 variables corresponding to average seasonal sea surface temperature in 16 areas of the Mediterranean, 1946-1988 | |
Original source Bartzokas et al. (1994) |
SST example |
Explain the dark/light red/blue shading | ||
Compared to the PCs | ||
Rotation (using varimax) gives separate regions in the first 2 PCs, rather than overall temperature and a contrast between regions | ||
SCoTLASS gives a simpler version of the rotated PCs | ||
SCoT (a technique that maximises a criterion combining variance and simplicity) gives a simpler version of the unrotated PCs; so do simple components, but simplicity in a different sense | ||
Slide 40 |
Slide 41 |
Other techniques for two groups of variables |
Redundancy analysis (van den Wollenberg 1977) RxyRyxakx = lkRxxakx (R matrices contain correlations) | ||
Related to PCA of instrumental variables and reduced rank regression | ||
Unlike CCA, MCA, one set of variables is treated as predictors, one as responses | ||
Principal predictors (Thacker 1999) | ||
Sxy[diag(Syy)]-1Syxakx=lkSxxakx | ||
Conditional MCA (An, 2003) | ||
Multivariate regression, combined PCA of x and y variables, separate PCAs of x and y followed by CCA, partial least squares and a number of others |
Three mode PCA |
xijk, i=1,2,Én; j=1,2,É,p; k=1,2,É,t is approximated by | |
m < n; q < p; s < t (Kroonenberg, 1983). Other varieties exist in the psychometric literature. | |