**Suggested Citation:**"11 Risk Assessment Models and Methods." National Research Council. 2006.

*Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2*. Washington, DC: The National Academies Press. doi: 10.17226/11340.

**11**

**Risk Assessment Models and Methods**

**RISK ASSESSMENT METHODOLOGY**

The occurrence of cancers is known to be related to a number of factors, including age, sex, time, and ethnicity, as well as exposure to environmental agents such as ionizing radiation. Understanding the role of exposure in the occurrence of cancer in the presence of modifying effects is a difficult problem. Contributing to the difficulty are the stochastic nature of cancer occurrence, both background and exposure related, and the fact that radiogenic cancers are indistinguishable from nonradiogenic cancers.

This section summarizes the theory, principles, and methods of risk assessment epidemiology for studying exposure-disease relationships. The two essential components of risk assessment are a measure of exposure and a measure of disease occurrence. Measuring exposure to radiation is a challenging problem, and dosimetry issues are discussed in detail elsewhere in this report; the common epidemiologic measures of disease occurrence are reviewed in this section. Evaluation of the association between exposure and disease occurrence is aided by the use of statistical models, and the types of models commonly used in radiation epidemiology are described below, as are the methods for fitting the models to data. This section ends with a description of the use of fitted models for estimating probabilities of causation and certain measures of lifetime detriment associated with exposure to ionizing radiation.

**Rates, Risks, and Probability Models**

Some individuals exposed to environmental carcinogens (*e.g.*, ionizing radiation) develop cancer and some do not; the same is true of unexposed individuals. Thus, cancer is not a necessary consequence of exposure, and exposure is not necessary for cancer. However, the greater incidence of cancer in individuals exposed to known carcinogens indicates that the probability or risk of developing cancer is increased by exposure. Compared to unexposed individuals, the elevated risks of exposed individuals are manifest by increased cancer rates in the latter group. Risks and rates are the basic measures used to compare disease occurrence in exposed and unexposed individuals. This section describes rates and risks and their relationship to one another as a prelude to the sections on modeling and model fitting.

*Incidence Rate*

A common measure of disease occurrence used in cancer epidemiology is the *incidence rate*. Incidence refers to new cases of disease occurring among previously unaffected individuals. The population incidence rate is the number of new cases of the disease occurring in the population in a specified time interval divided by the sum of observation times, in that interval, on all individuals who were disease free at the beginning of the time interval. In general an incidence rate is time dependent and depends on both the starting point and the length of the interval.

With data from studies in which subjects are followed over time, incidence rates can be estimated by partitioning the following period into intervals of lengths *L _{j}* having midpoints

*t*for

_{j}*j*= 1,…,

*J*, and estimating a rate for each interval. Let

*n*denote the number of individuals who are disease free and still under observation at time

_{j}*t*, and

_{j}*d*the number of new diagnoses during the

_{j}*j*th interval. An estimate of the incidence rate at time

*t*is obtained by dividing

_{j}*d*by the product of

_{j}*n*and

_{j}*L*:

_{j}The denominator in is an approximation to the sum of observation times on the *n _{j}* population members in the

*j*th interval and in practice is usually replaced by the actual observation time, which accounts for the fact that the

*d*diagnoses of disease did not occur exactly at time

_{j}*t*.

_{j}**Suggested Citation:**"11 Risk Assessment Models and Methods." National Research Council. 2006.

*Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2*. Washington, DC: The National Academies Press. doi: 10.17226/11340.

*Risk*

*Risk* is defined as the probability that an individual develops a specified disease over a specified interval of time, given that the individual is alive and disease free at the start of the time period. As with the incidence rate, risk is time dependent and depends on both the starting point and the length of the interval. In a longitudinal follow-up study as described above, the proportion of new occurrences *d _{j}* among

*n*disease-free individuals still under observation at time

_{j}*t*,

_{j}is an estimate of the risk or probability of disease occurrence in the *j*th time interval.

Incidence rates and risks are related via the general formula, risk = rate × time. For the longitudinal follow-up study estimates defined above, the relationship is manifest by the equation

*Probability Models*

The description of rates and risks in terms of estimates from a longitudinal follow-up study is informative and clearly indicates the relevance of these numerical quantities to the study of disease. However, the development of a general theory of risk and risk estimation requires definitions of rates and risks that are not tied to particular types of studies or methods of estimation. Probability models provide a mathematical framework for studying incidence rates and risks and also are used in defining statistical methods of estimation depending on the type of study and the data available.

Models for studying the relationship between disease and exposure are usually formulated in terms of the *instantaneous incidence rate*, which is the theoretical counterpart of the incidence rate estimate defined below. The instantaneous incident rate is defined in terms of the probability distribution function *F*(*t*) of the time to disease occurrence. That is, *F*(*t*) represents the probability that an individual develops the disease of interest in the interval of time (0, *t*). Two functions derived from *F*(*t*) are used to define the instantaneous incidence rate. One is the survivor function, which is the probability of being disease free throughout the interval (0, *t*) and is equal to 1 − *F*(*t*). The second is the probability density function, which is the derivative of *F*(*t*) with respect to *t*, that is, *f*(*t*) = (*d* / *dt*)*F*(*t*), and measures the rate of increase in *F*(*t*). The instantaneous incidence rate, also known as the hazard function, is the ratio

Integrating the instantaneous incident rate yields the *cumulative incidence rate*

The cumulative incidence rate and the distribution function satisfy the relationship

(11-1)

from which it follows that the instantaneous incidence rate completely determines the first-occurrence distribution *F*(*t*).

The risk of first disease occurrence in the interval (*t*, *t* + *h*), given no previous occurrence, is the conditional probability

When *h* is not too large, so that the difference quotient {*F*(*t* + *h*) − *F*(*t*)} / *h* approximates *f*(*t*) = *dF*(*t*)/*dt*,

Thus, among individuals who are disease free at time *t*, the risk of disease in the interval (*t*, *t* + *h*) is approximately λ(*t*)*h*. This approximation is the theoretical counterpart of the relationship between risks and rates described in the discussion of risk. In the remainder of this chapter, incidence rate means instantaneous incidence rate unless explicitly noted otherwise.

**Incidence Rates and Excess Risks**

It is clear that the incidence rate plays an important role in the stochastic modeling of disease occurrence. Consequently, models and methods for studying the dependence of disease occurrence on exposure are generally formulated in terms of incidence rates. In the following it is assumed that individuals have been stratified on the basis of age, sex, calendar time, and possibly other factors related to disease occurrence, and that incidence rates are stratum specific. In the simple case of two exposure categories, exposed and unexposed, let λ* _{E}*(

*t*) and λ

*(*

_{U}*t*) denote the incidence rates of the exposed and unexposed groups, respectively. If disease occurrence is unrelated to exposure, one expects that λ

*(*

_{E}*t*) = λ

*(*

_{U}*t*), whereas lack of equality between these two incidence rates indicates an association between disease occurrence and exposure.

A common measure of discrepancy between incidence rates is the difference

which by convention is called the *excess absolute risk* (EAR) even though it is, technically, a difference in rates. Rearranging terms results in

**Suggested Citation:**"11 Risk Assessment Models and Methods." National Research Council. 2006.

*Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2*. Washington, DC: The National Academies Press. doi: 10.17226/11340.

showing that EAR(*t*) describes the additive increase in incidence rate associated with exposure. For example, if the EAR is constant, EAR(*t*) = *b*, then the effect of exposure is to increase the incidence rate by the constant amount *b* for all time periods. Note that *b* = 0 corresponds to the case of no association.

A second common measure of discrepancy is the *relative risk* (RR), defined as

Rearranging terms shows that

so that RR(*t*) describes the multiplicative increase in incidence rate associated with exposure. When the RR is constant, RR(*t*) = *r,* the effect of exposure is to alter incidence rate by the factor *r*. If exposure increases risk, then *r* > 1; if exposure decreases risk, then *r* < 1, and *r* = 1 corresponds to the case of no association. The *excess relative risk* ERR(*t*) is

The ERR of the exposed and unexposed incidence rates are related via the equation

**RISK MODELS**

**Direct Estimates of Risk**

The previous section defined the fundamental quantities used in risk estimation: risks, rates, EAR, RR, and ERR, and established their relevance to the study of environmental carcinogens. These measures enable the study of differences in disease occurrence in relationship to time, by studying either EAR(*t*) or ERR(*t*) between unexposed and exposed groups. For most carcinogens, exposure is not a simple dichotomy (unexposed, exposed) but occurs on a continuum. That is, the exposure or dose *d* can vary from no exposure (*d* = 0) upward. In such cases the relationship between risk—or EAR(*t*) or ERR(*t*)—and dose is of fundamental importance. For all carcinogens it is generally agreed that sufficiently large doses increase the risk of cancer. By definition there is no increase in risk in the absence of exposure (*d* = 0). That is, when *d* = 0, both EAR(*t*) = 0 and ERR(*t*) = 0. Thus, for many carcinogens the only open or unresolved issue is the dependence of risk on small or low doses. Low-dose ranges are often the most relevant in terms of numbers of exposed individuals. They are also the most difficult ranges for which to obtain unequivocal evidence of increased risk. These difficulties result from the fact that small increases in risk associated with low levels of exposure are difficult to detect (using statistical methods) in the presence of background risks.

The difficulties can be seen by considering the estimates of risk from the longitudinal follow-up study described in “Rates, Risks, and Probability Models.” For a time period *L _{j}*, let

*n*

_{j}_{,}

*,*

_{E}*d*

_{j}_{,}

*and*

_{E}*n*

_{j}_{,}

*,*

_{U}*d*

_{j}_{,}

*denote the number of individuals at risk at the start of the interval and the number of occurrences of disease during the interval for the exposed and unexposed subgroups, respectively. A direct estimate of the excess risk for the*

_{U}*j*th time period is the difference between two proportions (

*d*

_{j}_{,}

*/*

_{E}*n*

_{j}_{,}

*) − (*

_{E}*d*

_{j}_{,}

*/*

_{U}*n*

_{j}_{,}

*). Even in the favorable situation in which the baseline risk is relatively well estimated compared to the risk of the exposed group (when*

_{U}*n*

_{j}_{,}

*is large relative to*

_{U}*n*

_{j}_{,}

*), the ability to reliably detect small increases in risk associated with exposure requires a large number of exposed individuals at risk. For example, using the usual criterion for statistical testing in order to detect with probability .80 a 5% increase in risk when the baseline risk is 0.10, the number of individuals at risk in the exposed group would have to be approximately*

_{E}*n*

_{j}_{,}

*= 30,000.*

_{E}A key objective of this report is the calculation of quantitative estimates of human health risks (*e.g.*, cancer) associated with exposure to ionizing radiation for specific subpopulations defined by stratification on variables such as sex, age, exposure profile, and smoking history. In theory, such estimates could be derived by identifying a large group of individuals having common exposure profiles within each stratum and following the groups over a long period of time. As described above, the proportion of individuals in each group who develop cancer in specific time periods provides the desired estimates of risk. However, this approach is not feasible because sufficient data are not available. At low levels of exposure, cancer risks associated with exposure are small relative to baseline or background risks. The increases in observed cancer rates associated with exposure are small relative to the natural random fluctuations in baseline cancer rates. Thus, very large groups of individuals would have to be followed for very long periods of time to provide sufficiently precise estimates of risk associated with exposure. Consequently, direct estimates of risk are not possible for stratified subpopulations. The alternative is to use mathematical models for risk as functions of dose and stratifying variables such as sex and age.

**Estimation via Mathematical Models for Risk**

Model-based estimation provides a feasible alternative to direct estimation. Model-based estimates efficiently exploit the information in the available data and provide a means of deriving estimates for strata and dose profile combinations for which data are sparse. This is accomplished by exploiting assumptions about the functional form of a risk model. Of course, the validity of estimates derived from models depends on the appropriateness of the model; thus model choice is important. The accepted approach in radiation epi-

**Suggested Citation:**"11 Risk Assessment Models and Methods." National Research Council. 2006.

*Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2*. Washington, DC: The National Academies Press. doi: 10.17226/11340.

demiology is to base models on radiobiological principles and theories of carcinogenesis to the fullest extent possible, keeping in mind statistical limitations imposed by the quantity and quality of data available for model fitting. Biologically based and empirically derived mathematical models for risk are discussed in the next two sections.

**Biologically Based Risk Models**

Biologically based risk models are designed to describe the fundamental biological processes involved in the transformation of somatic cells into malignant cancer cells. The use of biologically based risk models in epidemiologic analyses can result in a greater understanding of the mechanisms of carcinogenesis. These models can also help to expose the complex interrelationships between different time- and age-dependent exposure patterns and cancer risk. Biologically based risk models provide an analytical method that is complementary to the traditional, well-established, empirical approaches.

Armitage and Doll (1954) observed that for many human cancers the log-log plot of age-specific incidence rates versus age is nearly linear, up to moderately old ages. This observation has led to the development of models for carcinogenesis. In brief, Armitage and Doll’s theory postulates that malignant transformation occurs following the *k*th stage of a series of spontaneous and irreversible changes (Armitage 1985). The corresponding hazard function is of the form λ(*t*)= *at ^{k}*

^{−1}, where

*t*denotes time and

*a*is a constant reflecting the dependence of the hazard on the number of stages,

*k*. These models have been fit to various data sets, leading to the observation that most cancers arise after the occurrence of five to seven stages. Comprehensive reviews of the mathematical theory of carcinogenesis have been given by Armitage and Doll (1961), Whittemore (1978), and Armitage (1985).

In response to the multiplicity of parameters produced by their earlier models, Armitage and Doll proposed a simpler two-stage model designed to avoid parameters not readily estimable from available data. A major limitation of these early two-stage models is their failure to address the multiplication and death of normal cells, which was known to occur in tissues undergoing malignant change (Moolgavkar and Knudson 1981). A revised two-stage model was later proposed by Moolgavkar and colleagues, which allowed for the growth of normal tissue and the clonal expansion of intermediate cells (Moolgavkar and Knudson 1981). Numerous two-stage models have since been described in the literature (Fisher 1985; Moolgavkar 1991; Sielken and others 1994; Luebeck and others 1996; Heidenreich and others 1999, 2002a, 2002b; Moolgavkar and others 1999; Heidenreich and Paretzke 2001; Moolgavkar and Luebeck 2003).

The two-stage clonal expansion (TSCE) model assumes a normal stem cell population of fixed size *X* and a rate of first mutation of *v*(*d*), depending on the dose *d* of the carcinogen. The number of initiated cells arising from the normal cell pool is described by a Poisson process with a rate of *vX*. The initiated cells then divide either symmetrically or nonsymmetrically. Symmetrical division results in two initiated cells, while nonsymmetrical division results in an initiated cell and a differentiated cell. The rate of symmetrical division is designated by α(*t*), and the death differentiation rate by β(*t*). The difference α − β is the net proliferation rate for initiated cells. The rate of division into one initiated cell and one malignant cell is designated by μ(*t*) (Hazleton and others 2001).

TSCE models for radiation carcinogenesis have now been applied successfully to a number of important data sets, including atomic bomb survivors (Kai and others 1997) and occupational groups such as nuclear power plant workers and miners (Moolgavkar and others 1993; Luebeck and others 1999; Sont and others 2001). A study of atomic bomb survivors illustrates the usefulness of the two-stage model in radiation epidemiology (Kai and others 1997). Findings from this analysis include the observation of a high excess risk among children that may not be explained by enhanced tissue sensitivity to radiation exposure. The temporal patterns in cancer risk can be explained in part by a radiation-induced increase in the pool of initiated cells, resulting in a direct dose-rate effect (Kai and others 1997). Exact solutions of the two-stage model (Heidenreich and others 1997) and multistage models (Heidenreich and others 2002b) have been applied to atomic bomb survivors’ data.

Another data set to which application of the TSCE has been useful is the National Dose Registry (NDR) of Canada. This database contains personal dosimetry records for workers exposed to ionizing radiation since 1951, with current records for more than 500,000 Canadians (Ashmore and others 1998). Application of the TSCE model to the NDR suggests an explanation of the apparently high excess relative risk observed, relative to the A-bomb data (Sont and others 2001). The TCSE model reveals that the dose-response for the NDR cohort is consistent with the lung cancer incidence in the A-bomb survivors’ cohort, provided that proper adjustments are made for the duration of exposure and differences in the background rate parameters.

In addition to the TSCE model, the Armitage-Doll model of carcinogenesis has evolved into several other analytic methods, including the general mutagen model (Pierce 2002). The basic assumption of this model is that a malignant cell results from the accumulation of mutations, with *k* mutations required for malignancy. The effect of exposure is that an increment of dose at age *a*, at rate *d*(*a*), results in a multiplicative increase λ* _{r}*[1 + β

*d*(

*a*)] in the rate of all

*k*mutations. Although this model applies to both recessive and dominant mutations, it does not explicitly allow for selective proliferation of cells having only some of the required mutations. The general mutagen model has been applied successfully to A-bomb survivor data (Pierce and Mendelsohn 1999; Pierce and Preston 2000) and to underground miners exposed to radon (Lubin and others 1995).

**Suggested Citation:**"11 Risk Assessment Models and Methods." National Research Council. 2006.

*Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2*. Washington, DC: The National Academies Press. doi: 10.17226/11340.

Whereas empirical approaches to risk modeling rely on statistical models to describe data, biologically based models depend on fundamental assumptions regarding the mechanisms of radiation carcinogenesis. The parameters created by modern biologically based risk models have direct biological interpretation, provide insight into cancer mechanisms, and generate substantive questions about the pathways by which exposure to ionizing radiation can increase cancer risk. These models also provide a way of describing temporal patterns of exposure and risk.

Although biologically based risk models have many strengths, some general limitations are associated with their use. Such models can only approximate biological reality and require an understanding of the complex mechanisms of radiation carcinogenesis for interpretation. In addition, it is difficult to distinguish among alternative models that yield similar dose-response curves without direct information on the fundamental biological processes represented by the model, which are often unknown. Biologically based risk models are generally more complex than empirical models and may require richer databases to develop properly. Despite these limitations, biologically based models have found many applications for important epidemiologic data sets, and the successes achieved to date afford support for the continual development of such models for future analyses that will directly inform the association between radiation exposure and human cancer risk.

Biologically based models have not been employed as the primary method of analysis in this report for several reasons. The mechanisms of radiation carcinogenesis are not fully understood, which makes the development of a fully biologically based model difficult. The data required for a biologically based model, such as rates of cell proliferation and mutation, are also generally not available. The availability of empirical risk models that provide a good description of the available data on radiation and cancer permits the preparation of useful risk projection.

**Empirically Based Risk Models**

The following symbols are used to describe the variables that enter into risk models based on the Japanese A-bomb survivor data:

*a:* attained age of an individual

*e:* age at exposure to radiation

*d:* dose of radiation received

*s:* code for sex (1 if the individual is a female and 0 if male)

*p:*study population-specific factors

Models also sometimes include time since exposure (*t*). Since *t* = *a* − *e*, models that include *a* and *e* implicitly include *t*.

Models for the incidence rate for individuals of age *a*, exposed to dose *d*, at age *e*, generally depend on sex *s* (1 for females, 0 for males) and other study population-specific factors generically represented by *p*. For example, the study population-specific parameters for A-bomb survivor data models are city *c* and calendar year *y*, that is, *p* = (*c*, *y*). The incidence rate is, in general, a function λ(*a*, *e*, *d*, *s*, *p*) of all of these factors. By definition, the background incidence rate does not depend on either *d* or *e*, so the EAR formulation of the exposed incidence rate has the form

and the ERR formulation is

where EAR (*a*, *e*, *d*, *s*, *p*) and ERR (*a*, *e*, *d*, *s*, *p*) are the EAR and ERR, respectively. When the excess risk functions are dependent on the study population—that is, when they depend on the factor *p*—estimates of risk derived from the models are specific to the study population and therefore of limited utility for estimating risks in other populations. Thus, it is desirable to find suitable models in which either the excess risk or the excess relative risk does not depend on population-specific parameters. Consequently, models used in radiation risk estimation are often of the form

or

That is, the excess risk functions depend only on *a*, *e*, *d*, and *s*, but not *p*. Note that if *t* represents time after exposure, then because *t* = *a* − *e*, any two of the variables *t*, *a*, and *e* determine the third, so at the current level of generality, the excess risk functions could also be written as functions of *t*, *e*, *d*, and *s*. Also, because there is no excess risk at ages prior to exposure (*a* < *e*), ER(*a*, *e*, *d*, *s*) = 0 (*a* < *e*), EAR(*a*, *e*, *d*, *s*) = 0 and ERR(*a*, *e*, *d*, *s*) = 0 for *a* < *e* and thus, λ(*a*, *e*, *d*, *s*, *p*) = λ(*a*, *s*, *p*) for *a* < *e*. The formulas and equations in the remainder of this chapter are described only for the relevant case *a e*.

Radiobiological considerations suggest that for low-dose, low-LET (linear energy transfer) radiation, the risk of disease for an individual exposed to dose *d* depends on a linear or quadratic function of *d*. That is, risk depends on dose *d* through a function of the form

where α_{1} and α_{2} are parameters to be estimated from the data. At higher doses of radiation, cell sterilization and cell death compete with the process of malignant transformation, thereby attenuating the risk of cancer at higher doses. A more general model applicable to a broader dose range and used extensively in radiation research is

**Suggested Citation:**"11 Risk Assessment Models and Methods." National Research Council. 2006.

*Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2*. Washington, DC: The National Academies Press. doi: 10.17226/11340.

The models for dependence on dose are generally incorporated into risk models by assuming that the excess risk functions are proportional to *f*(*d*), where the multiplicative constant (in dose) depends on *a*, *e*, and *s*.

**VARIABLES THAT MODIFY THE DOSE-RESPONSE RELATIONSHIP**

In general, cancer rates vary considerably as functions of attained age, and there is strong evidence indicating that cancer risks associated with radiation exposure also vary as functions of attained age and age at exposure. For example, it has been observed that after instantaneous exposure to radiation, leukemia and bone cancer rates rise for a short period of time (≈ years) and then decrease to baseline rates over a longer period of time (≈ years). In contrast, the available evidence suggests, and it is generally believed, that rates for most other cancers increase after exposure to radiation and possibly remain at elevated levels at all ages.

Models for the dependence of risk on variables such as age at exposure, attained age, and time since exposure are often empirical and are justified more by epidemiologic and statistical principles than by radiobiological theory. A useful class of models that includes the modifying effects on radiation dose-response of attained age, age at exposure, and gender has the form

for EAR models, and

for ERR models, where *g*(*a*, *e*, *s*) is a function of attained age, age at exposure, and gender. Because time since exposure is equal to the difference *t* = *a* − *e*, this class of models includes models defined as functions of time since exposure. Often *g* depends on *e* and *t* via exponential and power functions.

For example, the committee’s preferred model for solid cancer uses

where is *e* − 30 years if *e* is less than 30, and 0 if *e* is greater than or equal to 30; and γ, η, and θ are unknown parameters, which must be estimated from the data.

**Model Parameter Estimation**

Models describe the mathematical form of a risk function, but the parameters in the model must be estimated from data. For example, a linear dose model presupposes that risk increases linearly with dose but the slope of the line, which measures the increase in risk for a unit increase in dose, must be estimated from data. Similarly, models for the effect of modifying factors depend on parameters that must be estimated from data. The most common method of fitting risk model data (*i.e.*, estimating the unknown parameters in the model) is the method of maximum likelihood reference. Given a model for the probability density of the observed data, a likelihood is obtained by evaluating the density at the observed data. The likelihood is a function of the data and the unknown parameters in the probability density model. The parameters are estimated by those values in the parameter space (the set of all allowable parameter values) that maximize the likelihood for the given data values.

There are several approaches for the numerical calculations of likelihood analysis. Estimation based on grouped data using a Poisson form of the likelihood (Clayton and Hills 1993) has been used for the analyses of atomic bomb survivors and other major epidemiologic studies of radiation health risks.

This analysis is facilitated by forming a table so that individuals contributing information to each cell of the table have equal, or approximately equal, background rates. In particular, the table is formed by the cross-classification of individuals into categories of age at exposure, time period, exposure dose, and all other variables that appear in the model. The key summary variables required for each cell are the total person-years (PY) of observation in the cell, the number of new cases of cancer, the mean dose, the mean age at exposure, and the mean age or mean time since exposure.

For an RR model, the contribution to the likelihood from the data in each cell of the table has the same form as a Poisson likelihood (thus permitting well-understood and straightforward computations), with the mean equal to the product of PY; a parameter for the common, cell-specific background rate; and the RR 1 + *fg*, where *f* and *g* are functions of dose and of age, age at exposure, and sex, described previously.

The full likelihood is the product of the cell-specific Poisson likelihoods. Numerical optimization is required to maximize the likelihood, and statistical inference generally is based on large-sample approximations for maximum likelihood estimation.

**Using the Estimated Model**

The models developed as described above can be used to estimate both lifetime risks and probabilities of causation, both of which are discussed below. Following this, several limitations in the use of these models, which lead to uncertainties in estimated risks, are discussed. Further discussion of uncertainties and the committee’s approach to quantifying them can be found in Chapter 12.

*Estimating Lifetime Risks*

To calculate the lifetime risk for a particular age at exposure and a particular gender, one essentially follows a sub-

**Suggested Citation:**"11 Risk Assessment Models and Methods." National Research Council. 2006.

*Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2*. Washington, DC: The National Academies Press. doi: 10.17226/11340.

ject forward in time and calculates the risk of developing a radiation-induced cancer at each age subsequent to age at exposure. This requires probabilities of survival to each subsequent age, which are obtained from life tables for the population of interest. ERR models are expressed in terms of a relative increase in the sex- and age-specific background rates for the cancer of interest; these rates are usually obtained from cancer mortality vital statistics for the population of interest (or incidence rates if cancer incidence is to be estimated).

An important issue in estimating lifetime risks is the extrapolation of risks beyond the period for which follow-up data are available. No population has been followed for more than 40 or 50 years; thus, it is not possible to model the EAR or ERR directly for the period after follow-up has ended, a limitation that is primarily important for those exposed early in life. Estimating lifetime risks for this group thus requires assumptions that are usually based on the observed pattern of risk over the period for which data are available. For example, if the ERR appears to be a constant function of time since exposure, it may be reasonable to assume that it remains constant. Alternatively, if the EAR or ERR has declined to nearly zero by the end of the follow-up period, it may be reasonable to assume that the risk remains at zero.

Another important issue is how to apply risks estimated from studying a particular exposed population to another population that may have different characteristics and different background risks. Specifically, the application of estimates based on Japanese atomic bomb survivors to a U.S. population is a concern, since background rates for some specific cancers (including stomach, colon, liver, lung, and breast) differ substantially between the two populations. The BEIR V (NRC 1990) committee calculations were based on the assumption that relative risks (ERR) were comparable for different populations; however, the BEIR III (NRC 1980) committee modified its ERR models based on the assumption that absolute risks were comparable. Some recent efforts have used intermediate approaches with allowance for considerable uncertainty (NIH 1985, 2003).

**Estimating Probabilities of Causation**

The *probability of causation* (PC; NIH 1985, 2003) is defined as the ratio of ERR to RR:

where for brevity the dependence of ERR on dose, time variables, and possibly other individual characteristics is suppressed. For the RR models described previously, ERR = *fg*, where *f* = *f*(*d*) and *g* = *g*(*a*, *e*, *s*), in which case

Thus, the ERR model provides immediate PC estimates.

**Modeling Caveats**

The theory of risk assessment, modeling, and estimation and the computational software for deriving statistically sound parameter estimates from data provide a powerful set of tools for calculating risk estimates. Risk models provide the general form of the dependence of risk on dose and risk-modifying factors. Specific risk estimates are obtained by fitting the models (estimating unknown parameters) to data. The role of data in the process of risk estimation cannot be overemphasized. Neither theory, models, nor model-fitting software can overcome limitations in the data from which risk estimates are derived. In human epidemiologic studies of radiation, both the quality and the quantity of the data available for risk modeling are limiting factors in the estimation of human cancer risks. The quality of data, or lack thereof, and its impact on risk modeling are discussed below under three broad headings. The primary consequence of less-than-ideal data is uncertainty in estimates derived from such data.

*Incomplete Covariate Information*

The specificity of risk models is limited by the information available in the data. Even the most extensive data sets contain, in addition to measurements of exposure, information on only a handful of predictor variables such as dose, age, age at exposure, and sex. Consequently, models fit to such data predict the same risk of cancer for individuals having the same values of these predictor variables, regardless of other differences between the two individuals. For example, two individuals who differ with respect to overall health status, family history of cancer (genetic disposition to cancer), exposure to other carcinogens, and so on, will be assigned the same estimated risk provided they were exposed to the same dose of radiation, are of the same age, and have the same age at exposure and the same gender.

Consequently, among a group of individuals having the same values of the predictor variables in the model, some will have a higher personal risk than that predicted by the model and some will have a lower personal risk. However, on average, the group risk will be predicted reasonably well by the model. The situation is similar to the assessment of insurance risk. Not all teenage males have the same personal risk of having an automobile accident (some are better drivers than others), yet as a group they are recognized as having a greater-than-average risk of accidents, and premiums are set accordingly. From the insurance company’s perspective, the premiums are set fairly in the sense that their risk models adequately predict the claims experience of the group.

Radiation risk models are similar in that they adequately predict the disease experience of a group of individuals sharing common values of predictor variables in the model. However, such estimated risks need not be representative of individual personal risks.

**Suggested Citation:**"11 Risk Assessment Models and Methods." National Research Council. 2006.

*Health Risks from Exposure to Low Levels of Ionizing Radiation: BEIR VII Phase 2*. Washington, DC: The National Academies Press. doi: 10.17226/11340.

**Estimated Doses**

The standard theory and methods of risk modeling and estimation are appropriate under the assumption that dose is measured accurately. Estimated radiation dose is a common characteristic of human epidemiologic data, and questions naturally arise regarding the adequacy of dose estimates for the estimation of risk parameters and the calculation of risk estimates. These are different problems and are discussed separately.

First, consider the problem of calculating risk estimates from a given risk equation. Suppose that the risk equation has been estimated without bias and with sufficient precision to justify its use in the calculation of risks. Assume also that risk increases with dose: that is, the risk equation yields higher risks for higher doses. Suppose that an estimate of lifetime risk is desired for an individual whose dose is estimated to be *d*. If *d* overestimates the individual’s true dose, the lifetime risk will be overestimated; if *d* underestimates the true dose, the risk will be underestimated. This is intuitive and is a consequence of the fact that risk is an increasing function of dose.

The problem of estimating risk equation parameters from data with estimated doses is a little more complicated. Errors in estimated doses can arise in a number of different ways, not all of which have the same impact on risk parameter estimation. For example, flaws in a dosimetry system have the potential to affect all (or many) dose estimates in the same manner, leading to systematic errors for which all (or many) dose estimates are too high or too low. Errors or incomplete records in data from which dose estimates are constructed (*e.g.*, badge data from nuclear industry workers) are likely to result in more or less random errors in dose estimates (*i.e.*, some individuals will have dose estimates that are too high and others will have estimates that are too low). Systematic errors can result in biased estimates of risk equation parameters. The type of bias depends on the nature of the systematic error. For example, risk equations derived from data with doses that are overestimated by a constant factor (>1) will result in an underestimation of risk at a particular given dose *d*; doses that are underestimated by a constant factor (<1) will result in an overestimation of risk. Random errors in dose estimates also have the potential to bias estimated risk equations. Random error-induced bias generally results in the underestimation of risk. That is, random errors tend to have the same qualitative effect as systematic overestimation of doses.

The estimation of risk models from atomic bomb survivors has been carried out with a statistical technique that accounts for the random uncertainties in nominal doses (Pierce and others 1990). To the extent that it is based on correct assumptions about the forms and sizes of dose uncertainties, it removes the bias due to random dose measurement errors.

*Data from Select Populations*

Ideally, risk models would be developed from data gathered on individuals selected at random from the population for which risk estimates are desired. For example, in estimating risks for medical workers exposed to radiation on the job, the ideal data set would consist of exposure and health information from a random sample of the population of such workers. However, data on specific populations of interest are generally not available in sufficient quantity or with exposures over a wide enough range to support meaningful statistical modeling. Radiation epidemiology is by necessity opportunistic with regard to the availability of data capable of supporting risk modeling, as indicated by the intense study of A-bomb survivors and victims of the Chernobyl accident.

A consequence of much significance and concern is the fact that risk models are often estimated using data from one population (often not even a random sample) for the purpose of estimating risks in some other population(s). Cross-population extrapolation of this type is referred to as “transporting” the model from one population to another. The potential problem it creates is the obvious one—namely, that a risk equation valid for one population need not be appropriate for another. Just as there are differences in the risk of cancer among males and females and among different age groups, there are differences in cancer risks among different populations. For example, the disparity between baseline rates for certain cancers (*e.g.,* stomach cancer) in Japanese and U.S. populations suggests the possibility of differences in the risks due to radiation exposure.

Transporting models is generally regarded as a necessity, and much thought and effort are expended to ensure that problems of model transportation are minimized. The decision to use EAR models or ERR models is sometimes influenced by concerns of model transport. Problems of transporting models from one population to another can never be eliminated completely. However, to avoid doing so would mean that risk estimates would have to be based on data so sparse as to render estimated risks statistically unreliable.