Francesca Dominici, Thomas A. Louis, Aidan McDermott, and Jonathan M. Samet
The John Hopkins University

Controlling for confounding in time sereis studies of air pollution and mortality: How smooth (or rough) should we be?

Time series studies of air pollution and health are aimed at estimating associations between day-to-day variations in air pollution and day-to-day variations in daily mortality counts in the presence of: 1) observed time-varying confounders (e.g., weather); and 2) time-varying unobserved confounders such as respiratory influenza and trends in survival. To eliminate long-term trends and seasonal variations in the mortality time series, a smooth function of calendar time, f(t), is included in the regression formulation. The statistical\epidemiological target is to determine the degree of smoothness of f(t) that sufficiently reduces confounding bias when estimating the pollution coefficient.

The choice of the number of degrees of freedom (df) used to represent f(t) is one of the most discussed statistical issues in time series analyses of air pollution and health. This choice is critical because it determines the time scales of variations in the health outcome and exposure used for the estimation of the air pollution coefficient. Choosing too small a df, that is over-smoothing f(t), might result in confounding bias. At the other extreme, choosing too large a df, that is under-smoothing f(t), might wash out the pollution effect by over adjusting or inflating the statistical variance of the pollution coefficient estimate.

Current approaches for df-selection in environmental epidemiology are: data-driven methods, that is the number of degrees of freedom in the smooth function of time is estimated based upon optimality criteria such as the Akaike Information Criteria, or based upon prior choices on df supported by sensitivity analyses.

In this talks we show that data-driven methods for df-selection are generally not suitable strategies for removing confounding bias. We then introduce a Bayesian Model Averaging approach for estimating the pollution coefficient which takes into account prior information and uncertainty about df, that is about the time scales of variations in the time series where confounding might occur. Methods are applied to time series data from the NMMAPS study.