the natural persistence of disease. It may also arise from the slowness of change in variables that determine effective exposure or act as confounders. As a result, daily observations of most outcomes are not independent. Analyses of such data need to test for lack of independence and appropriately control for it when present.

Methods for Analyzing Serially Correlated Data

For normally (Gaussian) distributed outcomes, well-established methods of analysis can be used to take account of serial correlation. Work on autoregressive models dates from the 1940s; see for instance, Cochrane and Orcutt (1949).

The structure of the covariance is often parameterized in terms of autoregressive parameters, moving-average parameters, and combinations of the two. An autoregressive structure describes a model where the correlation between the residuals at time i and time i-k declines monotonically as k increases. In a first-order autoregressive structure, for instance, the correlation between today and yesterday is assumed to be r, between today and 2 days earlier to be r2, etc. A moving average has a correlation with a fixed lag and zero correlation with any further lags. Combinations can be chosen to fit the pattern of serial correlation observed in the data. In most cases, health and disease variables are likely to show autoregressive patterns because an abrupt termination of the correlation is not likely.

An alternative model, often called state dependence, refers to a Markov-type structure where the outcome on day i is dependent on the outcome on day (i -1) but, given the outcome at (i - 1), not on any earlier outcome. For example, the prevalence of an illness with an average duration of a week (e.g., the common cold) will clearly depend on whether the subject had that illness the day before. Such models are described by Muenz and Rubenstein (1985) and were used in analysis of environmental data by Korn and Whittemore (1979). In contrast, incidence data are generally less subject to day-to-day correlation, though they can still be serially correlated (Schwartz et al., 1991), suggesting the covariance model described above. If there are covariates for which statistical modeling is imperfect (e.g., weather), the residuals of the model may also exhibit serial correlation. The presence of a lagged dependent variable in a model with serial correlation in the errors is unattractive because the correlation between the predictor variable and the error term means that usual least-squares regression estimates are biased and inconsistent. In these circumstances, the lagged dependent variable can be "instrumented." Instrumentation is the process of fitting a predictive model to a variable, using all possible predictors (except the hypothesis variable). Then the lagged



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement