*Risk* is defined as the probability that an individual develops a specified disease over a specified interval of time, given that the individual is alive and disease free at the start of the time period. As with the incidence rate, risk is time dependent and depends on both the starting point and the length of the interval. In a longitudinal follow-up study as described above, the proportion of new occurrences *d _{j}* among

is an estimate of the risk or probability of disease occurrence in the *j*th time interval.

Incidence rates and risks are related via the general formula, risk = rate × time. For the longitudinal follow-up study estimates defined above, the relationship is manifest by the equation

The description of rates and risks in terms of estimates from a longitudinal follow-up study is informative and clearly indicates the relevance of these numerical quantities to the study of disease. However, the development of a general theory of risk and risk estimation requires definitions of rates and risks that are not tied to particular types of studies or methods of estimation. Probability models provide a mathematical framework for studying incidence rates and risks and also are used in defining statistical methods of estimation depending on the type of study and the data available.

Models for studying the relationship between disease and exposure are usually formulated in terms of the *instantaneous incidence rate*, which is the theoretical counterpart of the incidence rate estimate defined below. The instantaneous incident rate is defined in terms of the probability distribution function *F*(*t*) of the time to disease occurrence. That is, *F*(*t*) represents the probability that an individual develops the disease of interest in the interval of time (0, *t*). Two functions derived from *F*(*t*) are used to define the instantaneous incidence rate. One is the survivor function, which is the probability of being disease free throughout the interval (0, *t*) and is equal to 1 − *F*(*t*). The second is the probability density function, which is the derivative of *F*(*t*) with respect to *t*, that is, *f*(*t*) = (*d* / *dt*)*F*(*t*), and measures the rate of increase in *F*(*t*). The instantaneous incidence rate, also known as the hazard function, is the ratio

Integrating the instantaneous incident rate yields the *cumulative incidence rate*

The cumulative incidence rate and the distribution function satisfy the relationship

(11-1)

from which it follows that the instantaneous incidence rate completely determines the first-occurrence distribution *F*(*t*).

The risk of first disease occurrence in the interval (*t*, *t* + *h*), given no previous occurrence, is the conditional probability

When *h* is not too large, so that the difference quotient {*F*(*t* + *h*) − *F*(*t*)} / *h* approximates *f*(*t*) = *dF*(*t*)/*dt*,

Thus, among individuals who are disease free at time *t*, the risk of disease in the interval (*t*, *t* + *h*) is approximately λ(*t*)*h*. This approximation is the theoretical counterpart of the relationship between risks and rates described in the discussion of risk. In the remainder of this chapter, incidence rate means instantaneous incidence rate unless explicitly noted otherwise.

It is clear that the incidence rate plays an important role in the stochastic modeling of disease occurrence. Consequently, models and methods for studying the dependence of disease occurrence on exposure are generally formulated in terms of incidence rates. In the following it is assumed that individuals have been stratified on the basis of age, sex, calendar time, and possibly other factors related to disease occurrence, and that incidence rates are stratum specific. In the simple case of two exposure categories, exposed and unexposed, let λ* _{E}*(

A common measure of discrepancy between incidence rates is the difference

which by convention is called the *excess absolute risk* (EAR) even though it is, technically, a difference in rates. Rearranging terms results in