The use of covariance matrices in dimension reduction for space-time data


	Ian Jolliffe
	Universities of Exeter, Kent, Aberdeen
	ian@sandloch.fsnet.co.uk

Outline of talk


	How is dimensionality reduced? Concentrate on principal component analysis (EOF analysis) which uses covariance or correlation matrices
		Definitions
		Implementation
		Interpretation
		Choices
		Simplification
	Extensions (if time permits)
		to two (or more) groups of variables
		to three or more modes
	Some concluding remarks

PCA and EOFs


	Principal component analysis (Hotelling, 1933)
	Empirical orthogonal functions (Lorenz, 1956)
	Other names too
	Reduces dimensionality by finding linear combinations of a large set of variables that successively maximise variance
	Limitations
		Can be more difficult to interpret than using a subset of the original variables, but typically not for space-time data
		Linearity. Non-linear versions exist – not discussed here.
		Uses only covariances, not higher-order moments – see independent component analysis (ICA)

PCA – some definitions, terminology


	If x is a vector of p variables, then the principal components (PCs) are linear combinations a^T₁x, a^T₂x, É a^T_px
	Although we can find p PCs, and sometimes the last few are useful (e.g in finding outliers), for dimension reduction purposes we usually only keep the first few
	In the kth PC a_k, the vector of coefficients or loadings, is chosen so that the variance of a^T_kx is maximised, subject to a normalisation constraint a^T_ka_k = 1, and subject to successive PCs being uncorrelated

Finding PCs/EOFs


	The optimisation problem which defines PCs turns out, like many in multivariate analysis, to be an eigenvalue problem
	The variances of the PCs are eigenvalues of the covariance (or correlation) matrix of x, in descending order, and the vectors of coefficients a_k are the corresponding eigenvectors
	This is the usual way of finding PCs, though other algorithms exist e.g using the singular value decomposition of the column-centred data matrix

PCA in atmospheric science (geoscience)


	Most common format is Ôvariables = stations or gridpoints; observations = different timesŐ
	Eigenvectors are known as empirical orthogonal functions (EOFs) and the technique as EOF analysis
	Elements of the EOFs are often plotted as contours on a map
	Note that the EOFs are vectors of loadings in PCA, not the PCs themselves, which are time series in this context

Example – northern hemisphere sea level pressure (NH SLP)


	The data are monthly mean SLP for winter, from 1948 to 2000, on a 2.5Ą x 2.5Ą grid for the NH north of 20ĄN from the NCEP/NCAR reanalysis
	Some preprocessing has taken place – removing annual cycle - and area weighting based on the square root of the cosine of latitude has been used, but these details need not concern us here

Slide 8

Example 2 – NH 850hPa streamfunction


	The data are monthly mean streamfunction for extended winter (includes March), from 1979 to 1997, for the whole NH from the ECMWF reanalysis (ERA)
	Some preprocessing has taken place – removing annual cycle
	EOFs are often interpreted as Ôphysical modesŐ and there is considerable argument over which EOFs correspond to such modes, and whether EOFs can find them, or even whether they exist

Slide 10

Choices in PCA


	There are a number of decisions to be made in PCA
		Covariances or correlations?
		How many PCs/EOFs?
		Which normalisation constraint?

Covariance or correlation


	PCs and their variances may be found by calculating eigenvectors and eigenvectors of either (a) a covariance matrix or (b) a correlation matrix
	Corresponds to successively maximising variances of linear combinations of the raw variables
	Corresponds to standardising each variable to have unit variance before the successive maximisation
	NOTE: there is no simple relationship between the PCs found from the two types of matrix

Covariance or correlation II


	It is important to use correlations, not covariances, if variables are measured in different units (pressure, temperature) to avoid effects of arbitrary scaling
	Most geoscience applications only use one type of variable – choice between covariance and correlation depends whether it is desirable for all variables (spatial locations) to have the same weight or allow those with greater variances to have an increased chance of dominating the first few PCs

How many PCs/EOFs?


	There are numerous rules (see Jolliffe, 2002a, Chapter 6) based on:
		Size of individual variances (eigenvalues, λ_k)
		Cumulative sum of variances
		Changes in successive variances
		Physical interpretability
		More complicated techniques
	Types 2 & 4 are probably most often used in geoscience, but rules of type 5 have been suggested and 3 (gaps between eigenvalues) is also sometimes important (e.g if EOFs are to be rotated)

Choice of normalisation constraint


	In the kth PC a_k, the vector of coefficients or loadings, is chosen so that the variance of a^T_kx is maximised, subject to a normalisation constraint a^T_ka_k = 1, and subject to successive PCs being uncorrelated
	The results presented may have a^T_ka_k = 1, but alternatives are a^T_ka_k = λ_kor a^T_ka_k = 1/ λ_k, where λ_kis the eigenvalue (variance) associated with the kth PC
	In interpreting what a PC represents in terms of the original variables, the normalisation is unimportant – the maps look exactly the same. It is the relative values of the a_kj within a_k that are important.
	However, there are differences in interpretation of individual loadings which need not concern us here

Simplification


	PCs can be difficult to interpret, though often less so for space-time data than other types of data. To aid interpretation, various simplification techniques have been proposed.

Simplification II


	Rotation (orthogonal or oblique)
	Restriction of loadings to discrete set of values
	LASSO-based approach
	Others
		Combining variance maximisation and simplification criteria
		Truncation of loadings
		Empirical orthogonal teleconnections
		etc.

Rotation


	Well-known and widely-used but controversial (Richman, 1986, 1987; Jolliffe, 1987,1995; Mestas-Nuez 2000). Among questions to be addressed are
		Orthogonal or oblique
		Choice of simplicity criterion e.g. varimax
		How many EOFs to rotate
		Choice of normalisation constraint

Example of rotation– USA summer precipitation


	402 stations (variables)
	1312 times (observations) = 41 (3-day periods in May-Aug) x 32 (years)

Slide 20

Other Simplification Methods


	We give no details here, but show the results of applying one of them (LASSO-based) to the earlier NH SLP example

Slide 22

Slide 23

Relationships between variables in two (or more) groups


	We may wish to relate two sets of variables e.g. sea surface temperatures and mean sea level pressure. A variety of techniques is available
		Canonical correlation analysis
		Maximum covariance analysis (SVD)
		Many others

Canonical correlation analysis (CCA)


	To find relationships between two groups of variables, find pairs of linear functions of variables, one from each group, that have maximum correlation, subject to being uncorrelated with previously found pairs
	Turns out to be another eigenvalue problem, involving covariance matrices between (S_xy) and within (S_xx, S_yy) groups of variables
	Solve S_xyS_yy^-1S_yxa_kx=l_kS_xxa_kx(a_kx = vector of loadings for x variables – similar equation for y variables)

Maximum covariance analysis


	Also
		inter-battery factor analysis (Tucker, 1958)
		SVD (Bretherton et al., 1992)
	Similar to CCA except
		It successively maximises covariance rather than correlation
		Vectors of loadings are orthogonal, rather than derived variables uncorrelated
	Solves S_xyS_yxa_kx = l_ka_kx

Maximum covariance analysis: Pacific SST vs. Hemispheric 500mb height ( Wallace et al., 1992)

Extensions to 3 (or more) modes


	By ÔmodesŐ here I mean ÔtimeŐ, ÔspaceŐ –extras might be different climate variables, different levels in the atmosphere. Some extensions:
		O-mode, P-mode, É, T-mode analyses
		Extended EOF analysis
		Three-mode PCA

O-mode, P-mode, É, T-mode


	Not really an extension – given 3 modes, most often space, time, climate variable, choose one as ÔvariableŐ, one as ÔobservationŐ, ignore the third, and do PCA. 6 possibilities.
	S-mode most usual: space = variables, time = observations.
	T-mode, not uncommon: time = variables, space = observations.
	Other 4 used occasionally

Extended EOF analysis


	n times, s spatial locations, p climate variables. Combine locations and variables to give (n x sp) data matrix and carry out EOF analysis on it.
	Can also incorporate different time lags to give multivariate EEOF (MEEOF) analysis (Mote et al., 2000).
	The latter also extends MSSA (Plaut & Vautard, 1994).

MEEOF example – 5 variables averaged over 0-10Ą S for various longitudes


	200mb velocity potential
	Outgoing radiation
	215mb water vapour
	100mb temperature
	100mb water vapour

Concluding remarks


	All the techniques discussed have an underlying objective of dimension reduction and all use covariance or correlation matrices. There is often a desire to physically interpret the new dimensions – this can be controversial
		For example, oblique rotation is advocated by some because Ôphysical modesŐ (NB different meaning of ÔmodeŐ) are often correlated. Other techniques are advocated (independent component analysis – ICA; Aires et al., 2000) on the premise that ÔmodesŐ are not just uncorrelated, but independent

More concluding remarks


	We have mentioned some EOF-related techniques very briefly and others not at all
	A large missing class consists of techniques explicitly designed for time series data e. g. SSA (Golyandina et al., 2001), MSSA (Plaut & Vautard, 1994), POP analysis (von Storch et al., 1988), MTM-SVD (Mann & Park 1999),É
	Why not consult Jolliffe (2002a) for further details and references?

Slide 34

Discrete set of values for loadings


	Hausmann (1982): -1, 0, +1
	Vines (2000), Jolliffe, Uddin & Vines (2002): more integers – gives so- called simple components
	Chipman & Gu (2004): find ordinary EOFs, then truncate to –1, 0, +1 or –c₁, 0, c₂
	Rousson & Gasser (2004): a technique that produces blocks of zeros and blocks of equal non-zero loadings

LASSO-based approach


	LASSO (Least Absolute Shrinkage and Selection Operator) developed in multiple regression to deal with multicollinearity.
	A compromise between variable selection and biased regression. Shrinks some regression coefficients exactly to zero.
	Adaptation to PCA: to the usual optimisation problem add an extra constraint (Jolliffe, Trendafilov & Uddin, 2003).

LASSO II


	Constraint is



	where a_jk is the j^th element in the k^th EOF, and t is a Ôtuning parameterŐ. As t ¨ 0, an increasing number of loadings are driven to 0.
	The technique is named SCoTLASS (Simplified Component Technique – LASSO).
	Zou et al. (2006) have an ÔimprovedŐ version of SCoTLASS, with an implementation in R

Mediterranean sea surface temperature example


	16 variables corresponding to average seasonal sea surface temperature in 16 areas of the Mediterranean, 1946-1988
	Original source Bartzokas et al. (1994)

SST example


	Explain the dark/light red/blue shading
	Compared to the PCs
		Rotation (using varimax) gives separate regions in the first 2 PCs, rather than overall temperature and a contrast between regions
		SCoTLASS gives a simpler version of the rotated PCs
		SCoT (a technique that maximises a criterion combining variance and simplicity) gives a simpler version of the unrotated PCs; so do simple components, but simplicity in a different sense

Slide 40

Slide 41

Other techniques for two groups of variables


	Redundancy analysis (van den Wollenberg 1977) R_xyR_yxa_kx = l_kR_xxa_kx(R matrices contain correlations)
		Related to PCA of instrumental variables and reduced rank regression
		Unlike CCA, MCA, one set of variables is treated as predictors, one as responses
	Principal predictors (Thacker 1999)
		S_xy[diag(S_yy)]^-1S_yxa_kx=l_kS_xxa_kx
	Conditional MCA (An, 2003)
	Multivariate regression, combined PCA of x and y variables, separate PCAs of x and y followed by CCA, partial least squares and a number of others

Three mode PCA


	x_ijk, i=1,2,Én; j=1,2,É,p; k=1,2,É,t is approximated by


	m < n; q < p; s < t (Kroonenberg, 1983). Other varieties exist in the psychometric literature.