Methodological Considerations in Evaluating the Evidence
Many factors affect the validity of scientific data on the effects of diet on human health. The most straightforward way to study such relationships is to select a group of subjects, collect relevant data on dietary intake and health indications for each person in the group, and attempt to determine causal relations between the two based on clues from such other sources as feeding studies in animals. This direct approach is rarely possible because of limitations in our ability to accurately quantify dietary intake and because of the many unknowns about the direct effect of diet on health and chronic diseases relative to the effects of other environmental and genetic variables. Thus, indirect approaches are commonly required. The first part of this chapter addresses various approaches to the assessment of dietary intake by humans. The second part deals with the evaluation of all types of single studies relevant to assessing the impact of dietary intake on health. In the third part of the chapter, the committee considers criteria for drawing inferences about causality from the evidence as a whole.
Assessment of Dietary Intake of Humans
Methods for Assessing Dietary Intake
A major impediment in studying the effects of diet on health is the difficulty of assessing dietary intake of humans (Bazzarre and Myers, 1980; Bingham, 1987; Block, 1982; Burk and Pao, 1976; Dwyer, 1988; Marr, 1971; Medlin and Skinner, 1988; Pekkarinen, 1970; Sorenson, 1982; Young and Trulson, 1960). Each of the assessment methods in use has its weaknesses. The choice of method depends on whether the assessment pertains to the average intake of a group or to the habitual intakes of individuals within a group, the level of detail (e.g., food groups, foods, or nutrients) desired, and the degree of precision needed in determining amounts of foods consumed. Additional considerations include the costs, burden on respondents, and availability of critical resources such as trained interviewers and accurate food composition tables. The chosen dietary intake assessment method must be tested to ensure its accuracy and reliability in the study population, and adequate training of personnel involved in collecting and analyzing data is essential. In certain circumstances, dietary survey methods can be combined to improve accuracy (Dwyer, 1988). Methods commonly used in epidemiologic research to assess dietary intake are discussed below, along with their advantages, disadvantages, and problems of validation.
Group Dietary Data
In most nations, average dietary intake is estimated from national food supply data or from
national surveys of food intake by households or individuals. Food supply is estimated by adding the quantity of food imported to the quantity produced within a country and then subtracting the sum of food exported, destroyed by pests during storage, and put to nonfood use (e.g., in the production of industrial alcohol). The final figure is divided by the total population to obtain the average per-capita food availability. The results are estimates of foods that disappear into wholesale and retail markets; they fail to account for food wasted before consumption, food fed to pets, and home-grown foods (when the latter is not included in production data). Nutrients available in the food supply are usually estimated from standard food composition tables.
Data on per-capita food availability provide useful leads for further research on the relationship of diet to disease, because they enable investigators to compare rates of chronic diseases among countries with marked differences in mortality rates from chronic diseases and in the availability of specific nutrients in their food supply. These cross-sectional comparisons do not control for confounding factors, nor can they be used to show associations between diet and disease in individuals. The food supply in the United States has been monitored by the U.S. Department of Agriculture (USDA) since 1909, and the information gathered has been used to estimate trends in food use (see Chapter 3).
In household food inventories, food consumed is estimated by recording the difference between inventories of foods on hand at the beginning and end of the study periodusually 1 weekand accounting for food purchased or otherwise brought into the house. Average per-capita intake is estimated by dividing total household food intake by the number of people in that home.
Per-capita intakes by different age-sex groups in U.S. households are provided by national surveys conducted by USDA (USDA, 1984, 1987) and the U.S. Department of Health and Human Services (Carroll et al., 1983). These are also discussed in Chapter 3.
Individual Dietary Data
Food supply data and household food inventories supply only rough estimates of foods available and cannot be used to determine intakes of individuals. Methods most often used to assess individual intakes are food records and dietary recalls, both of which include diet histories and food frequency questionnaires.
The food record method requires participants to measure and record types and amounts of all foods and drinks consumed over a specified time. In some studies, all foods are weighed. In others, measuring cups and spoons and a ruler are used to assess dimensions. Food models, volume models, and photographs have also been used.
The 24-hour recall method requires that respondents report the types and amounts of foods they consumed over the previous 24-hour period. Information is obtained by face-to-face (in-person) interview or by telephone.
Diet history methods rely on interviewers or questionnaires to estimate the usual diet (or certain aspects of the diet) of subjects over a long period. The objective is to obtain a picture of habitual intake, which is more likely to be related to slowly developing diseases than is the intake over a time as short as 24 hours, which cannot represent the customary or usual intake. The classic diet history method used by Bertha Burke (1947) included a 3-day food intake record, a 24-hour recall, and an accounting of the frequency of food intakes over a period of 1 to 3 months. This method is rarely used today in its entirety. A less intensive version consists of two steps. First, an interviewer obtains detailed information about usual diet and portion sizes, e.g., what is usually consumed for each meal and for snacks. Then, to improve recall and obtain a more complete picture of habitual food practices, the interviewer helps the respondent review a detailed list of foods and adds anything omitted (Fehily, 1984).
Some diet histories are obtained through questionnaires, administered by an interviewer or completed independently by the respondent, that ask about the number of times each listed food is consumed (and sometimes the amounts) over a specified period, such as a few weeks or a year. This is often called the food frequency method. Few or many food items may be listed on the questionnaire. For example, in studies of the association between diet and cancer, the questionnaire may focus only on foods that provide a nutrient of particular interest.
In most case-control studies to determine the etiology of chronic diseases, investigators have recognized the difficulty of determining past dietary intake and have assumed that the current diet (or the diet prior to the onset of symptoms of the disease) reflects past intake sufficiently well to identify associations of dietary factors with disease (Morgan et al., 1978). However, recall of a diet from the distant past (17 to 25 years ago) or the
more recent past (3 years ago) can be influenced by current diet (Byers et al., 1983; Garland et al., 1982; Møller Jenson et al., 1984; Rohan and Potter, 1984; Van Leeuwen et al., 1983).
The most accurate methods are in large part free of random and systematic errors (Bingham, 1987). The accuracy of a particular data set (or, more generally, of a method) is defined as the degree to which recorded estimates of intake approximate actual intake. Accuracy can be reduced by many factors, such as poor memory of past dietary practices, inaccurate recall of amounts, wishful thinking, and a desire to please an interviewer. Thus, true intake can generally be known only if actual intake is observed and measured or weighed surreptitiously. This has been feasible in only a few studies of small numbers of subjects conducted for a short time. Since true dietary intake can rarely be determined, investigators often try to assess the accuracy of a new method by comparing results not with true intake but with results from some other accepted but possibly flawed method.
Following is a list of the errors that may occur in methods to assess intake. These include sampling errors, reporting errors, and errors due to wide day-to-day variation in dietary intake.
· Sampling Errors: Because of interindividual variation in usual diet, small samples may provide highly unrepresentative estimates of food intakes by populations even when individual data items are accurate.
· Nonresponse Bias: When randomly selected samples are meant to represent an entire population, high refusal rates will introduce a serious bias if those who refuse differ from respondents in important ways.
· Reporting Errors: Respondents must be motivated to cooperate. Most people have little reason to remember exactly what they ate in the recent past as well as at times long past. In recall methods, subjects may fail to remember accurately all foods eaten. In general, recall is more accurate if respondents have been alerted to the requirements of the recall method. Some groups, such as very old people and young children, may be poor subjects for recall methods.
Subjects may report intakes of foods and amounts they believe the investigator approves instead of their actual intakes. They may also be reluctant to admit to such habits as binge eating or high alcohol consumption. Many subjects do not correctly estimate portion sizes when they recall or record food intake.
If respondents are required to weigh or measure their foods to determine their exact intake, they may alter their usual dietary habits to make recording easier or to provide answers they think will please the investigator.
· Errors Relative to Day-to-Day Dietary Variability: Individuals vary widely from day to day in their intake of foods and nutrients. Indeed, even an accurate 24-hour recall will not include the most common foods if they were not consumed on the day of recall. In homogeneous populations, the variability within a given subject's intake may be much greater than that between subjects (Liu et al., 1978).
The number of days of dietary intake data needed for moderately accurate estimates of the usual or habitual intake of individuals varies from one nutrient to another and can sometimes exceed 30 days (to account for day-to-day variability).
· Interviewer Bias: Poorly trained interviewers may introduce errors by suggesting answers or by leading respondents.
· Errors Due to Use of Food Composition Tables: A given food item may vary in nutrient composition because of genetic variation, growing conditions, pest control measures, and conditions of storage, processing, or preparation for consumption. Single values in food composition tables are averages of representative samples of a given food and do not indicate the nutrient content of any specific sample.
Some food composition data are biased, because they are based on inappropriate analytical methods or because nutrient values are imputed. Nutrient data currently available in U.S. food composition tables are far from complete (Beecher and Vanderslice, 1984; Hepburn, 1987). Food composition tables do not account for incomplete bioavailability of nutrients in individual foods. They also do not include data on nutrients inadvertently added to food during preparation, such as calcium from tap water or iron from utensils (Bazzarre and Myers, 1980).
A few investigators have weighed and chemically analyzed samples of food identical to those actually consumed to determine actual intake of nutrients as calculated from food composition tables. Some of them have found general agreement of analyzed values with published tables (Bazzarre and Myers, 1980), but others report errors ranging from 2 to 20%, depending on the nutrient studied (Bingham, 1987).
A dietary intake method need not necessarily determine exact quantities of nutrients consumed (Block, 1982). For example, if data are to be used only to place individuals within upper and lower categories of a distribution, the assessment method might be tested for its ability to do this accurately rather than for its quantitative precision. Interest is increasing in the use of biologic markers to check the accuracy of food intake assessment methods; some possible markers are discussed below in the section on Biologic Markers.
The investigator must be cautious, however, in concluding that one method of assessing some dietary risk factor is fully interchangeable with another. Analyses comparing two methods commonly make use of some combination of group means, correlation coefficients, and regression slopes. Two methods used to measure the same risk factor may not agree on all three of these parameters. In the strictest sense, two methods are interchangeable only if the slope of the linear regression of one method on the other is unity (Lee, 1980; Lee and Kolonel, 1982; Lee et al., 1983).
Another important concept is that of reliabilitythe ability of a method to produce the same results when used repeatedly under the same conditions. While good reliability generally means that any bias is constant and random variability is not serious, failure to obtain the same results may be due either to actual changes in dietary intake or to an unreliable instrument. It may be necessary, therefore, to use biologic markers or other information about changes in food choices to determine whether or not dietary changes have occurred over time (Block, 1982).
Strengths and Weaknesses of Dietary Intake Assessment Methods
Each method of ascertaining dietary intake has its strengths and weaknesses. None of them is suitable for every purpose.
Many investigators believe that records of food weights provide the most accurate estimation of food intake. Consequently, they are often used as the standard against which other methods are validated (Bazzarre and Myers, 1980). However, this method requires highly cooperative, motivated, and literate respondents as well as trained personnel to supervise them and to code and calculate their nutrient intakes. As noted above, respondents may alter their diets to facilitate recording, thereby producing records that fail to reflect their usual intakes. Some subjects may be unable to weigh foods consumed away from home. Consequently, their recall of those foods and amounts may be inaccurate. Because of the burden placed on respondents who are asked to weigh their foods, cooperation rates are lowfrom 35 to 75% (Bazzarre and Myers, 1980). The large burden placed on the investigative staff by this method results in high costs, thus making it difficult for the investigator to obtain a sample that is both representative and sufficiently large. Because of the high costs and low cooperation rates, epidemiologists generally find that weighed or measured food intake records are useful chiefly to validate less costly and more easily applied methods.
Respondent cooperation may be improved if household measures rather than weights are used to determine amounts. Pilot studies in the target population may be undertaken to determine whether estimated weights of food are sufficiently free of bias (Bingham, 1987). Actual intake may be underreported with this method (Mertz and Kelsay, 1984).
The number of days during which food records should be kept depends on the research objectives (e.g., whether individual or group means are desired), the nutrients of interest, and the sample size. More extensive food records are required to estimate individual intake of highly variable nutrients such as vitamin A than to estimate less variable ones such as food energy. In one study, 29 adults kept daily food intake records for 1 year (Basiotis et al. 1987). An average of 31 days of intake data was required to predict an individual's usual intake of food energy, whereas an average of 433 days would have been needed to predict usual intake of vitamin A with the same degree of accuracy. In contrast, mean food energy intake of the group could be estimated from only 3 days of data, and mean vitamin A intake could be estimated from 41 days of data. Other investigators have also concluded that to estimate the mean intake of a group, 3-day records are adequate, provided that day-of-week variations are taken into account [intake on weekends may differ from that on weekdays (Sorenson, 1982)1 and that the sample size is sufficiently large (Bingham, 1987).
The 24-Hour Recall
This method is popular because the respondent burden is small, the time required for administration is short, and costs generally are low. Limitations include inaccurate reporting due to failure to
recall all foods and the portion sizes consumed, high day-to-day variability in nutrient intake, and bias due to a desire to please the interviewer or reluctance to report large intakes of alcohol, sweets, and other items that might draw disapproval. Even accurate 24-hour recall data cannot represent the habitual intake of individuals and cannot be used to identify individuals in the sample whose intakes are consistently high or low in the nutrients studied (Beaton et al., 1979; Block, 1982; Todd et al., 1983). Most investigators agree that a single 24-hour recall is valuable for assessing the mean intake of a group (Gersovitz et al., 1978; Madden et al., 1976; Sorenson, 1982; Young et al., 1952), but systematic errors may result in serious misclassification of respondents. It may be worthwhile to assess the accuracy of this method (perhaps in a subsample) by comparing 24-hour recall data with diet records (Bingham, 1987). Validity checks against biologic markers (such as 24-hour urinary nitrogen as a reflection of protein intake) can increase confidence that the 24-hour recall accurately reflects intake of certain nutrients in the past 24 hours (Bingham, 1987; Block, 1982). Several 24-hour recalls obtained periodically from the same people over several months or more can increase the accuracy of estimates of their usual intakes; this may be especially important for nutrients that are highly variable in the diet (Bazzarre and Myers, 1980; Liu et al., 1978; Rush and Kristal, 1982). A common misuse of data from a single 24-hour recall is the designation of cutoff points below which dietary intakes of individuals are considered to be inadequate (see Chapter 3).
In the diet history method, which covers longer periods, seasonal and other dietary variations can be taken into account and usual diet is not altered. However, a complex diet history requires a highly skilled interviewer and takes 1 to 2 hours of respondent time, followed by extensive checking and coding of records. Thus, costs are high. An alternative approach to estimating usual intakes, called the food frequency assessment method, often ignores portion sizes, so that questionnaires are easier to standardize, more rapidly administered (some are self-administered by the study subjects), and less expensive. Methods that provide information on the frequency of food consumption but not on the amounts consumed may be useful in ecological studies that do not require high accuracy, but may not be suitable for case-control or cohort studies (Chu et al., 1984). Because of validation problems, investigators often place greater confidence in the accuracy of a diet history method if it produces results consistent from one experimental situation or one study group to another.
Diet histories tend to produce higher estimates of intakes than do food records (Bazzarre and Myers, 1980; Bingham, 1987; Block, 1982; Dwyer, 1988; Jain et al., 1980; Sorenson, 1982; Young et al., 1952). The reproducibility of the diet history method based on repeated administration has been fairly good (Dawber et al., 1962; Hankin et al., 1983; Nomura et al., 1976; Reshef and Epstein, 1972).
Both the full diet history and the food frequency method are subject to recall errors and the possibility that respondents may report a diet of better quality than they have actually consumed. The survey instrument may need to be complex if the study population contains two or more groups with distinctly different dietary patterns.
Willett et al. (1985) and Block et al. (1986) developed semiquantitative food frequency questionnaires. Block and colleagues used dietary data from adult respondents in the Second National Health and Nutrition Examination Survey, whereas Willett and co-workers obtained information from a large sample of nurses. These questionnaires are promising but they have not yet been adequately studied in groups that may differ sharply from the general U.S. population, nor have they been sufficiently evaluated to determine how well they assess the dietary intakes of individuals (Dwyer, 1988).
Recall bias occurs when study subjects consistently remember their intake of a food as higher or lower than it really was. This bias is a special concern when different study groups have different degrees of recall bias. For example, patients with cancer of the gastrointestinal tract may seek to explain their disease in terms of dietary factors and consequently overestimate or underestimate a particular food component that they believe may be responsible, or they may think harder about their past diet and report more accurately than the rest of the population. Recall biases can be large, either positive or negative, and very difficult to detect.
Incompleteness of Food Composition Tables
Another problem in assessing nutrient intakes is the limited accuracy and completeness of standard food composition tables. There are sufficiently accurate data on the occurrence of protein and fat in most
foods; but for dietary fiber, some vitamins, and trace minerals, for example, the data are much more limited. Furthermore, the tables give average values and do not reflect variability among samples of the same food, nor do they reflect differences over time or geographic location, such as might be introduced by new strains of food animals or plants or by new methods of food preservation and storage (Beecher and Vanderslice, 1984; Hepburn, 1987).
Because of difficulties in validating dietary intake assessment methods, biologic markers are receiving more and more attention as independent validity checks. Such markers can indicate dietary intake, but the complexities of nutrient metabolism and genetic and environmental factors may affect their usefulness. For example, disease may affect nutrient intake (rather than vice versa), and it may also directly affect levels of a given marker in blood or urine. It is also important to know the length of time for which a marker can estimate dietary intake. To date, little has been done to establish the accuracy of dietary markers.
Possible markers include 24-hour urinary sodium excretion as a measure of sodium intake (Fregly, 1985) and 24-hour urinary nitrogen as an estimate of protein intake (Bingham and Cummings, 1985; Isaksson, 1980). The value of markers depends on the completeness of 24-hour urine collections, which can be assessed by measuring the urinary recovery of orally administered para-aminobenzoic acid (Bingham and Cummings, 1983). Creatinine excretion has often been used to check completeness of urine collections, but the coefficient of variation of 24-hour creatinine excretion is as high as 25%, and excretion is increased when meat is consumed (Bingham, 1987).
Other possible markers include toenail levels of selenium to assess selenium intake (Morris et al., 1983) and adipose tissue concentrations of fatty acids to assess types of fatty acids consumed (Beynen et al., 1980). Standard energy output equations, which include adjustment for the weight and age of subjects, can provide an approximate check on stated energy intake (Schofield et al., 1985).
Evaluating Single Studies for Quality and Relevance
The research methods used to gain reliable scientific knowledge, particularly knowledge about biologic phenomena, and the criteria used to evaluate their results have been developed over many years by a combination of intuition; biologic, statistical, and mathematical reasoning; and practical experience. A strong impetus for developing better methods of research design and statistical analysis came originally from the needs of agricultural research (Fisher, 1935). Many of those techniques are applicable to research in nutrition.
Knowledge about the relationship of diet to health is based on many thousands of reports of experimental and observational research published during the past century. In order to evaluate such a large amount of research, it is useful to distinguish between the accuracy of data (internal quality control that can generally be judged by others only if the research report is sufficiently complete) and the accuracy of conclusions (which often requires a broad range of additional knowledge for evaluation).
Scientists can agree that data are accurate even while they dispute conclusions. Furthermore, data remain accurate even when interpretations change. Although any valid scheme for combining information from separate studies would depend, in part, on their individual strengths and weaknesses, the committee has attempted to ensure adequate standards of quality by giving emphasis to peer-reviewed studies.
The criteria used by the committee in evaluating conclusions from individual studies are common to evaluation of all scientific evidence and can be divided into two major categories:
· those related to study design and execution, including observational research and true experiments; and
· those related to interpretation, which depends heavily on the concepts and sometimes on the mathematical techniques of statistics as well as a broad understanding of biologic phenomena.
Investigations in humans include cross-sectional, case-control, cohort, and intervention studies, including clinical and community trials. Observational studies, in which the investigator must make use of situations that arise without intervening in them, differ in many critical ways from experiments in which the investigator controls both the assignment of subjects to treatments and the treatments themselves. Many studies of nutrition in humans are by necessity observational, and the general criteria for the quality of such work are similar to those for other observational investigations. Such criteria are discussed in textbooks on epidemiologic methods (e.g., Lilienfeld and
Lilienfeld, 1980; Mausner and Kramer, 1985). Likewise, experiments in humans and animals can be evaluated against general criteria for good experimental design, execution, and analysis that apply in all scientific disciplines and are described in many textbooks.
The following sections address the special problems encountered in meeting these criteria in studies of diet and health.
Problems Common to Observational Studies
Assessment of Dietary Intakes
Most knowledge about the relationship of diet to human health and disease, particularly chronic disease, has been derived from observational studies of people who selected their own food and drink over a lifetime. The strengths and weaknesses of different methods of measuring dietary intakes of groups and individuals are discussed earlier in this chapter.
Assessment of Disease Incidence and Prevalence
It is difficult to measure the prevalence or incidence of some diseases. Most cancers are detected efficiently and are diagnosed and reported accurately. Some other medical conditions of considerable consequence, such as hypertension and loss of bone density, require special techniques for detection, since they may become clinically apparent only when there is a catastrophic complication.
Mortality rates for demographic groups (e.g., nations) and subgroups (e.g., adults) are reliable indicators of the incidence of diseases that are detected clinically with accuracy and thoroughness and have a high case-fatality rate. Such diseases include severe myocardial infarction, cancer of the lung, and cirrhosis of the liver. Mortality rates are not reliable indicators of the incidence of diseases that are often not detected clinically and do not commonly cause death. Diseases in this category are bone loss and cholelithiasis. For these, later disease manifestations (i.e., disease sequelae) may be used as proxy indicators of disease. For example, hip fracture in an elderly person is often used as an indicator of osteoporotic bone. Otherwise, disease rates must be measured by surveys of population samples.
Biologic markers can sometimes be used to estimate past exposures, as when blood lead levels are used to assess intake of lead. They can also serve as markers of a developing disease, as when serum cholesterol concentrations and, more recently, serum lipoprotein concentrations are used to predict the risk of atherosclerosis and its sequelae. Much of the evidence relating nutrition to atherosclerosis is based on observational and experimental studies of associations between diet and serum lipid or lipoprotein levels. The relevance of this evidence depends directly on the strength of the association of serum lipoprotein levels with atherosclerosis and related diseases.
Hypertension is a diet-related characteristic that is easy to measure in a large population sample. It also has a variety of serious sequelae (cardiac hypertrophy, congestive heart failure, stroke), although these also have other causes. Osteoporosis is an example of a disease for which new technology can measure a biologic marker (bone density) easily and accurately. Indeed, bone density measurements are so readily available and so closely linked to underlying pathophysiology that they are refining our definition of osteoporosis.
Autopsy results can be used to determine the presence of some diseases and conditions with great accuracy, but such information must be used with caution because the combined clinical observations and autopsy findings are available for only a small and highly selected (and thus potentially highly biased) proportion of the population. Only about 15% of all deaths in the United States are now investigated by autopsy, and in recent years the rate has declined by approximately 1% per year (Council on Scientific Affairs, 1987). Furthermore, autopsy findings are rarely used to revise the cause of death that is recorded on death certificates. In some studies of diet and health that include detailed, long-term individual nutritional assessments, such as the Honolulu Heart Program (McGee et al., 1984) and the Puerto Rico Heart Health Program (Garcia-Palmieri et al., 1980), autopsy findings have been valuable both for improving the accuracy of the assigned cause of death and for assessing other conditions of nutritional interest that did not contribute to the death.
Another valuable use of the autopsy is to study early stages of a disease that has a long natural history by examining the tissues of children and young adults who die of other causes. The most frequent cause of death between the ages of about 2 and 40 years in industrialized countries is accidents, which affect a cross-section of the population that in general is not seriously biased by the late complications of chronic diseases. Thus, autopsies of accident victims are useful for detecting early, subclinical stages of atherosclerosis, os-
teoporosis, gallbladder disease, and obesity in the population from which they were drawn. Unfortunately, it is extremely difficult to obtain accurate and detailed nutritional data for people who die from accidental causes. Thus, even when the investigator uses great caution, data from autopsies could lead to erroneous conclusions about associations among diseases and between diseases and environmental exposures such as diet.
When the prevalence or incidence of or mortality from disease is correlated with food intakes and compared among specific populations (i.e., ecological correlations), problems of interpretation are encountered. For example, in industrialized countries, compared to those less technically developed, death reporting and assignment of causes of deaths on death certificates are usually more accurate. The criteria for diagnosing a disease may vary among regions, and there may be local biases in diagnosing certain diseases. Thus, there are many opportunities for errors or differences in disease measurement and classification that could bias ecological correlations. Intensive studies of death certificate data have shown that the accuracy of disease rates based on conventional death certificates varies greatly among cities, countries, and causes of death (Puffer and Griffith, 1967).
In summary, accurate assessment of disease end points is critical in the evaluation of nutrition-related causes of disease. Each disease presents different problems, and each study must be evaluated in light of the best knowledge about the disease in question.
Effects of Misclassification
The effect of misclassifying individuals with regard to dietary exposure or disease end point depends on the type of study. In ecological correlations, in which group rates and means are used, random misclassification of a small proportion of subjects may increase variance in study results and make it more difficult to detect diet-disease correlations; however, such errors do not ordinarily introduce serious bias when they affect all subgroup means to about the same degree. However, if the misclassifications are systematic, and vary from group to group, they may bias the means in different ways and thus bias the comparisons of interest. For example, if certain diseases are underreported in countries or regions with both poor medical services and low intakes of a nutrient, a spurious correlation of the disease with high nutrient intake may be introduced or a correlation in the other direction obscured.
In studies of individuals, random misclassification with regard to diet or disease can seriously attentuate correlations and thus reduce the power to detect associations when they are present.
A special type of misclassification of exposure arises if dietary intake data are collected at a time when diet could not have caused the disease in question. Dietary intake after a disease is established may not be an index of the relevant exposure since the disease may have affected the diet, rather than vice versa. Misclassification of this type is especially serious in studies of chronic diseases, most of which develop over long periods during which they do not produce readily detectable signs or symptoms. For example, atherosclerosis begins in childhood but usually does not produce clinically manifest disease until middle age or later. Sometimes the error is obvious, but the temporal relationship of diet to the critical stages of pathogenesis is often unknown, and misclassification of this type may sometimes be undetectable. Similarly, in some diseases, exposure to a dietary factor may be important only during certain periods, such as the hypothesized effect of fat intake early in life on breast cancer. In a few long-term prospective epidemiologic studies, investigators have measured disease outcome 10 to 20 years after the dietary assessmenta useful approach for many chronic diseases with a long latencybut the opportunities for such long follow-up are rare.
Confounding refers to associations that are real but do not indicate a causal link. A confounding factor, or confounder, must be associated with both the exposure of interest and the effect. For example, absence of teeth is associated with the consumption of large quantities of milk, but milk is certainly not a cause of the condition; edentulous infants drink a lot of milk, and age (correlated with both milk consumption and absence of teeth) is a critical confounder. More subtle, and hence more serious, confounding often results because of correlations among food or nutrient intakes (is it sodium or chloridewhose occurrence in foods is highly correlatedthat is more harmful?) and bias in determining dietary exposure, health outcomes, or both.
It may be impossible to control for known sources of confounding, much less for those unrecognized and unsuspected. The great strength of randomized clinical studies is that the randomiza-
tion of study subjects reduces bias and provides a basis for such valid statistical measures as p values and confidence limits. It can never be entirely certain that nonrandomized subjects in the comparison groups are sufficiently similar to ignore possible confounding.
Confounding must be considered in attempts to relate nutrition to chronic diseases, because most chronic diseases have multiple causes and because nutrition is also greatly influenced by social, cultural, economic, and geographic factors. For example, differences among countries in the incidence of coronary heart disease (CHD) are correlated with differences in fat and cholesterol intake, but they are also correlated with many other aspects of life. Other evidencesuch as that derived from experimental research on possible causal mechanismsmust be considered in order to determine which dietary correlates of disease are causal and, hence, what dietary changes are likely to be beneficial.
Discrepancies Between Ecological and Individual Correlations
Correlations between disease rates and dietary intake levels can be computed by using group rates or means or values for individuals. However, there are many examples of diet-disease correlations that are strong when based on population means (e.g., dietary and serum cholesterol levels) but weak or nonexistent when based on values for individuals. Prominent examples are the strong ecological correlations found between dietary fats and CHD, and between dietary fats and breast cancer, and the weak or absent individual correlations for the same pairs of variables. Each type of correlation has strengths not present in the other, and both may be important in gaining an understanding of specific relationships. Ecological correlations are less affected by random variability and can exploit large interpopulation differences in diet, whereas individual correlations can be used more effectively in dealing with bias and confounding.
Different correlation coefficients obtained with these two approaches are not necessarily contradictory. In ecological correlations, averaging across individuals to determine the mean dietary intake or disease occurrence of a population greatly reduces the effects of variation among individuals and of random errors in classification. Furthermore, there is often an opportunity to select populations representing high and low extremes of exposure and disease rates. Both effects tend to increase the precision of the correlation coefficient, so that causal associations are more readily detectable. In contrast, in individual correlations, variation among subjects in exposure or response, as well as in genetic variability, are not masked by averaging across subjects. Furthermore, variation in diet among subjects within a population is usually much less than variation among populations. Both of these effects attenuate the correlation coefficient computed from individual values. On the other hand, studies of individuals can more often be designed and conducted to reduce the effects of bias.
In a case-control study, subjects with the disease in question are compared with disease-free control subjects with regard to suspected causative agents. Thus, subjects are enrolled on the basis of their outcome (diseased or healthy) rather than on the basis of their exposure (with or without some dietary factor) (White and Bailar, 1956). The case-control study is a valuable epidemiologic method for identifying causes of diseases, because it can be rapid (even when critical exposure data refer to times long past) and inexpensive. However, it has potentially severe limitations when used to examine dietary causes of chronic diseases. The major limitation is the difficulty of sorting out the time of exposure and the time of disease origin, as discussed previously. Another limitation of case-control studies is that the choice of control subjects (e.g., hospital-based versus neighborhood-based controls) can influence study findings, especially in studies of chronic diseases with subclinical forms whose prevalence is high. Atherosclerosis is a prominent example. Almost every adult has some degree of atherosclerosis; many have severe atherosclerosis without clinical manifestations. Consequently, atherosclerosis in apparently healthy control subjects may be almost as extensive as in subjects with clinically manifest CHD. Thus, associations are likely to be weakened by classification of disease as present or absent (implicit in the differentiation of cases from controls), rather than as a graded condition.
Misclassification of dietary exposures can be serious in case-control studies, even with unambiguous health outcomes such as cancer. When bias is not serious, positive findings from case-control studies are likely to give conservative estimates of the strength of the association. When bias is likely to be strong, neither positive nor negative results of case-control studies are reliable.
Longitudinal (Cohort) Studies
In some epidemiologic studies, a group of apparently healthy people (a cohort) is characterized and then followed for a long time for occurrence of disease. The Framingham Study (Dawber, 1980) is an example of this. Since information on relevant risk factors must be collected at the outset of the study, however, cohort studies have usually focused on the confirmation of suspected risk factors, as well as on estimation of magnitude of effects, identification of subgroups at especially high or low risk, and other refinements of exposure-outcome relationships. In cohort studies, it is common to measure all independent variables at one time, which may not represent usual exposures over a prolonged period. Furthermore, as in case-control studies, misclassifications of dietary exposure can occur. Nevertheless, cohort studies have been very successful in the investigation of suspected risk factors for chronic disease, e.g., associations of serum cholesterol concentration and dietary components with cardiovascular diseases (Shekelle et al., 1981).
Problems Common to Intervention Studies
Experimental design is discussed thoroughly in many textbooks (e.g., Lilienfeld and Lilienfeld, 1980; Mausner and Kramer, 1985). A variety of techniques can be used to increase and improve the quality of the information provided by experiments. In experiments with a small number of subjects and considerable individual variability, the crossover design is useful. In that type of study, each subject is exposed to two or more dietary treatments (Bailar and Mosteller, 1986). When applied to disease end points, however, the crossover design is strongest when the disease or its marker is temporary and readily reversible. This approach is not often applicable to the study of long-term or permanent effects of diet.
Duration of Exposure
The long exposure required for diet to produce the common chronic diseases of adults is probably the greatest single handicap to the use of experimental methods in nutrition research. It is difficult to control the diets of noninstitutionalized people for any length of time and, as a practical matter, impossible to control diets for months or years, much less decades, as would be required to test dietary hypotheses regarding CHD, certain cancers, osteoporosis, and certain other chronic diseases. One useful stratagem to balance what is ideal with what is feasible is to study the effect of diet on intervening variables rather than on disease occurrence.
This approach is illustrated by one of the most ambitious nutrition research projects ever conductedthe National Diet-Heart Study (AHA, 1968) of approximately 1,000 men in each of five centers in the United States. This was designed as a feasibility study for a definitive trial of diet and heart disease. A central laboratory prepared foods with different amounts and types of fats and different amounts of cholesterol, but with similar appearances and using similar methods of food preparation, so that treatments were blinded. Foods were provided to participants at costs competitive with those of ordinary foods for 1 year. The results demonstrated conclusively that fat-modified diets lowered serum cholesterol concentrations in noninstitutionalized men. There was no attempt to assess CHD. With the results of this study in hand, the possibility of conducting a diet trial with CHD as an end point was examined by an independent body of experts. They concluded that such a project was not feasible, in view of the number of people (up to 100,000), length of time (10 years), and enormous cost required to achieve reasonable statistical power (NHLBI, 1971).
Clinical Trial Design
The design of clinical trials has been developed extensively over the past 20 years (Peto et al., 1976, 1977; Shapiro and Louis, 1983). Such trials have been used to great advantage in comparing drugs, devices, and operative procedures (including placebos) but to only a limited extent in testing hypotheses about nutrition and chronic diseases. As shown in the National Diet-Heart Study (AHA, 1968), it is difficult and expensive to conduct a study using a double-blind design involving manipulation of diet, but the difficulty and expense may well be justified in light of the potential value of the findings-that is, when compared to the cost of not doing such a study. As discussed above, duration of exposure is also a major obstacle.
The problems that have affected some of the intervention (experimental) trials in humans include (Bailar and Mosteller, 1986):
· incomplete compliance of the study group assigned to treatment;
· dilution of effect by control subjects who decide on their own to adopt the intervention (a strong possibility with high-risk subjects properly informed of the risks of nonintervention at the outset of a study);
· secular changes in intercurrent illness and death, especially when the changes (good or bad) may be influenced by the treatment; and
· uncertainty about the optimal timing of the intervention in relation to the outcomes of interest.
Because of these problems, few questions of nutrition and chronic diseases are suitable for rigorously controlled, double-blind clinical trials.
Most experiments concerned with nutrition and chronic diseases in human subjects are short-term studies of small numbers of people and include manipulation of dietary variables and measurement of physiological responses that may illuminate disease mechanisms or may be predictors of chronic diseases. Some of these are conducted with tightly controlled formula diets prepared in a laboratory kitchen and fed to subjects residing in a metabolic ward; others are conducted with noninstitutionalized subjects who are instructed to eat more or less of certain foods. In metabolic ward studies, dietary control and information collection are maximized, but since the costs of such studies tend to be very high, exposure periods and numbers of subjects are limited. In studies involving noninstitutionalized people, larger study populations and longer exposures are possible, but control of the diet is limited, blinding is rarely possible, and assessment of actual diet is difficult, as discussed above for observational studies.
Strengths and Weaknesses of Laboratory Experiments
Studies in Animals
Historically, experiments in animals have played an important role in research on nutrition, even though there are considerable differences among species in their needs for various nutrients and in their responses to specific dietary manipulations. Experiments in dogs were used to determine the relationship of niacin to pellagra, and rats and other animals were used in research on vitamin D. Much has been accomplished with rodents in defining essential amino acids and identifying complete proteins. Even though scurvy was identified and preventive diets were designed from observations and therapeutic trials on sailors, much additional information was gained by feeding vitamin C-deficient diets to guinea pigsone of the few nonhuman species unable to make this vitamin. However, most of these studies were designed to induce deficiencies of single nutrients for relatively brief periods, usually not more than a few months, and the physiological derangements or tissue lesions produced were easily measurable.
In contrast, experiments in animals to investigate the relationship of nutrition to chronic diseases are more complex, but nevertheless have provided useful leads. The first evidence that dietary cholesterol was related to heart disease was the observation that feeding cholesterol to rabbits caused hypercholesterolemia and arterial lesions simulating atherosclerosis in humans. Human-like hypercholesterolemia and atherosclerosis have been produced by fat and cholesterol feeding in a variety of other species, including swine, dogs, and nonhuman primates. These models have confirmed and amplified the observations on dietary fats and cardiovascular disease in humans and have yielded considerable insight into the pathogenesis of the underlying atherosclerotic lesions (see Chapters 7 and 19). The induction of hypertension in rats by dietary salt supports the suspected relationship of salt intake to hypertension in humans (see Chapters 15 and 20). Animal models have confirmed the epidemiologic association of alcohol to liver injury and cirrhosis (see Chapters 16 and 25). Animal experiments are essential in screening foods and food additives for possible carcinogenic effects (see Chapter 17), and they have provided an important data base for evaluation of the role of dietary factors in carcinogenesis (NRC, 1982).
Variability among species is a more severe problem when research is directed toward diet and chronic disease relationships that are more likely to be unique to humans or to depend on uniquely human exposures. The more complex the physiological mechanisms involved in the pathogenesis of a human disease, the more difficult it is to ensure that the disease is (or can be) adequately simulated in animal models. The rat, for example, does not respond to diets enriched with cholesterol and fats by elevations in serum lipoproteins, whereas the rabbit, in which the effects of dietary cholesterol were discovered, is extraordinarily sensitive to such diets. There is no satisfactory animal model for testing the effects of dietary calcium on postmenopausal osteoporosis, perhaps because few animals undergo the spontaneous ovarian failure in
middle age that humans experience. Extensive reviews of animal models for each of the major chronic diseases have included assessments of their advantages and limitations (NRC, 1981, 1982).
Problems related to the interpretation of results of studies in animals are complex, and only a few general principles apply. When manipulation of nutritient composition in such studies produces a disease whose pathogenesis and pathology resemble those of a disease in humans and when the intervening variables are nearly identical, observations in animal studies support inferences of causation derived from studies in humans. However, a failure to induce the human disease in animals does not necessarily disprove a causal relationship in humans. Conversely, the existence of a diet-induced disease in one or more animal species does not prove its existence in humans, unless the relationship is also supported by evidence derived from studies in humans.
The advent of molecular biology has made possible the study of dietary effects at fundamental levels of cellular metabolism. These studies may explain differences among species in response to diet in terms of gene expression.
The most valuable role of animal models in research on nutrition and chronic diseases in humans is in the study of mechanisms and pathogenesis. Studies of the effects of diet on serum lipids and lipoproteins, calcium absorption and metabolism, tumor initiation and growth, blood pressure, and other physiological processes in selected animals have been useful in testing and refining hypotheses regarding mechanisms. In evaluating data obtained from animals and other laboratory studies, the committee gave greater weight to data derived from studies in more than one animal species or test system, on results reproduced in different laboratories, and on data demonstrating increases in response with increases in exposure.
Short-term tests are used primarily to prescreen chemical compounds for their possible carcinogenic or mutagenic potential (de Serres and Ashby, 1981). They can detect DNA-reactive or genotoxic agents (Weisburger and Williams, 1981) and can be used in a variety of biological systems such as microorganisms, mammalian cells, insects, and whole animals (de Serres and Ashby, 1981). In addition, these tests have high statistical power, are easily replicated, are relatively inexpensive, can be performed under different sets of experimental conditions, and have the ability to detect several end points relevant to carcinogenesis (Brockman and DeMarini, 1988).
A battery of these tests is usually used, with or without metabolic activation of the parent compound. These include tests for gene mutations in microorganisms, yeast, fungi, insects, or mice; structural chromosome aberrations; and other genotoxic effects such as numerical chromosome aberrations, DNA damage and repair, mammalian cell transformation, and target organ/cell analysis (EPA, 1982).
Short-term tests for mutagenicity have yielded useful data in investigation of the role of diet in carcinogenesis (see Chapter 17, and NRC, 1982). However, there are a number of drawbacks to the use of short-term tests in prediction of carcinogenicity: They are unable to detect carcinogens that do not interact with DNA (often called epigenetic carcinogens) such as some hormones and promoters (Weisburger and Williams, 1981); they cannot take into account metabolic effects of absorption, transportation, activation, detoxification, and excretion; and it is difficult to make quantitative risk assessments based only on the results of these tests (NRC, 1982).
Evaluating the Evidence as a Whole
The first step in assessing the evidence on diet and chronic diseases is to evaluate the quality and relevance of individual studies in light of the criteria discussed above. The second step involves assembling and evaluating all the evidence. In this step, the committee paid particular attention to the criteria for assessing inferences of causality and to the relative weights ascribed to each category of evidence reviewed.
Categories of Research Subjects and Methods
The committee considered both the subjects and the methods in each study. Most data reviewed by the committee were from studies in humans or animals. Other evidence obtained from short-term experiments in bacteria, cultured cells, or organs applied primarily to mutagenicity and cancer. In reviewing studies in animals, the committee considered whether or not experimental diets were within physiological ranges of intake or represented more extreme variations, whether the animal species selected for study were sufficiently similar to humans in responses to dietary modifi-
cation, and whether duration of exposures and periods of observation were appropriate.
To evaluate the aggregate data on diet and chronic diseases, the committee considered some a priori assumptions about the relative weights or degrees of importance ascribed to each category of evidence considered. For example, which categories of studies provide the strongest and weakest evidence for the associations in question? Specifically, does a well-controlled study of laboratory animals provide more or less convincing evidence about diet and human health than a weaker but more relevant epidemiologic study? The committee concluded, however, that such an approach is naive and would lead to ignoring important data or giving undue emphasis to certain studies because, as described above, no single category of study is perfect. Each has strengths and weaknesses. Therefore, the committee did not rely solely on any one category of evidence (e.g., relative risks in epidemiologic studies or responses in animal studies) but, rather, based its evaluation on the strength of the overall combined evidence.
Furthermore, there is no universally valid hierarchy or weighting of categories of studies and hence no comprehensive procedure for leaping from results to conclusions. Each putative association between diet and chronic disease must be evaluated on a case-by-case basis, taking into account such factors as the natural history of the disease under review and the inherent strengths and weaknesses of each category of study.
Common Problems in the Weighting of Studies
Studies in Humans
In evaluating the various types of studies in humans, the committee considered the traditional hierarchy based on the supposed degree of causal inference that can be derived from each. According to this hierarchy, for example, ecological studies may be useful in pointing to hypotheses but often provide the weakest evidence for causation, whereas case-control studies, cohort studies, metabolic studies, and randomized clinical and community trials provide increasingly stronger evidence for causation. Although the committee recognized that such an a priori weighting scheme could be helpful in evaluating many types of health-related data, its application to studies on diet and chronic diseases did not seem appropriate because of such problems as variable or unknown exposure levels and long disease latency periods.
The complex, interrelated nature of research on the association of diet with human health may sometimes diminish the generalizability of findings of metabolic ward studies in which only a few dietary components are altered and hence may weaken the inferences of causality derived from such data. In general, therefore, the committee accorded more weight to the findings of observational studies of noninstitutionalized individuals.
Studies in Animals
Evidence from animal experiments regarding nutrition and chronic diseases must be evaluated in light of all knowledge about the disease in question, the host, and the diet itself. The obvious advantages of experimental control over such matters as genetics and diet, and the opportunities for more intensive observation, are counterbalanced by the uncertainties of interspecies variability and by the highly uniform experimental conditions. Animal models are most valuable in studying the physiological and molecular mechanisms involved in nutritional effects; modem molecular biology has made them even more useful. Results from studies in animals cannot be used alone either to affirm or negate relationships between human diet and chronic diseases, nor can they be used to estimate accurately the size of the effects in humans.
Studies in Humans Compared to Studies in Animals and Other Experimental Models
In general, the committee accorded greater weight to studies in humans than to studies in animals and other experimental models. Studies in humans provide the most direct means for investigating the possible dietary causes of human diseases and thus circumvent the problem of species specificity in response and the need to extrapolate findings from animals to humans. In addition, more realistic dose-response relationships can be deduced from data on human populations than from data on animals, since the exposures (as well as the outcomes) are those that actually occur among people.
Studies in animals can play a critical role, however, by confirming findings in humans and going beyond proof of causation to examine such factors as mechanisms of action, determinants of individual susceptibility or resistance, and dose-response
curves. Where human evidence suggests an association, consistent laboratory findings can provide strong confirmation. Similarly, when animal data are not consistent with human data favoring an association, confidence in that association is diminished.
The committee had more difficulty in assessing the few cases in which a putative association was supported by strong evidence from laboratory animals but confirmatory evidence in humans was lacking. One such example is the evidence on polycyclic aromatic hydrocarbonsa class of naturally occurring food contaminants that are carcinogenic in animal experiments but not in dietary studies of humans. In such circumstances, the possibilities suggested by the animal experiments cannot be ignored but neither can they be taken as conclusive proof of human risk.
Experimental Studies Compared to Observational Studies
Experimental studies have several advantages: the investigator assigns exposure including randomization, observations are structured, controls are internal, and study subjects are deliberately selected. Such factors tend to increase the strength and precision of inferences of causality. However, because experimental studies tend to be based on small selected samples, their findings are not easily generalizable to noninstitutionalized populations. When the results of an experiment were judged not to be applicable to the general population, the committee gave greater consideration to observational studies.
Inferring Causality in Associations Between Diet and Chronic Diseases
Empirical Criteria for Inferring Causality
The committee used six general criteria, patterned after those adopted by Hill (1971), for inferring causality in associations between diet and chronic diseases: strength of association, dose-response relationship, temporally correct association, consistency of association, specificity of association, and biologic plausibility. All criteria were accorded roughly equal weight, with the exception of biologic plausibility, which in practice was given a little less weight since it is more dependent on subjective interpretation. Only the probability that an association is causal was considered, since complete proof of causality is rarely obtainable.
Three of the criteria used by the committee (strength of association, dose-response relationship, and temporally correct association) may be applied to the findings of single studies and can therefore be regarded in part as measures of internal validity. Any of these criteria may be satisfied in some, but not in all, studies testing the same or similar hypotheses. The other three criteria (consistency of association, specificity of association, and biologic plausibility) are not necessarily study specific and depend to a large degree on a priori knowledge.
Strength of Association
This is usually expressed as relative risk, i.e., the ratio of disease rates for people exposed compared to those not exposed to the hypothesized causal factor. Generally, the larger the relative risk, the greater the likelihood that a risk factor is causally related to outcome, i.e., the less likely it is due to confounding or other systematic error.
In applying this criterion, the committee recognized that weak associations, common in studies of diet and chronic diseases, may still be causal and may even have quite large population-wide effects. Furthermore, if most members of a population are similarly exposed to a suspected dietary risk factora common situation in the U.S. populationrelative risks may show small differences in disease rates among population subgroups.
The existence of a dose-response relationship (that is, greater effects with greater exposures) strengthens an inference that an association is causal. However, one high relative risk in the high-exposure category might provide evidence of a strong nonlinear dose-response trend or even of a threshold below which the effect does not occur. Conversely, there may be an upper limit on the size of an effect; i.e., little or no additional effect of increasing doses will be observed if most people studied are exposed to levels above the threshold for the risk factor. This upper limit may also explain why some studies of diet and chronic diseases show no association within the U.S. population. In studies of carcinogenic agents and cancer in laboratory animals, dose-response relationships are sometimes attenuated and even reversed at high doses (Bailar et al., 1988).
Temporally Correct Association
If an observed association is causal, exposure to the putative risk factor must precede the onset of disease by at least the duration of disease induction and latency. The committee interpreted the lack
of appropriate time sequence in an association as strong evidence against causation, but recognized that insufficient knowledge about the natural history and pathogenesis of chronic diseases may limit the effectiveness of using this criterion to infer causality.
Consistency of Association
This criterion requires that an association be found in a variety of studies, for example, in more than one study population and with different study methods. However, some studies have low statistical power and in certain cases can be systematically biased toward a no-effect finding. Similarly, many consistent findings limited to a particular category of evidence (e.g., from observational studies in humans) were, in general, given less weight than findings that are consistent across several categories of evidence. For example, in comparing ecological and individual correlations, more weight could be given to one or two ecological studies supported by a technically strong study of individual correlations than to a larger number of ecological studies without that support.
Specificity of Association
This is the degree to which one factor predicts the frequency or magnitude of a single outcome (disease). The more specific an association, the greater the likelihood that the association is causal. However, perfect specificity is rare, given the complex nature of most chronic diseases, the overlapping and often ill-defined nutrient composition of the human diet, and the effects of many dietary components on a variety of organ systems and pathogenic processes. Thus, the committee generally regarded lack of specificity in diet-disease associations with less importance than clear evidence of substantial specificity.
Biologic plausibility requires that a putatively causal association fits existing biologic or medical knowledge. The committee therefore recognizes that interpretation of this criterion depends to a large degree on current knowledge of the natural history of a given disease or of its pathophysiological mechanisms. Sufficient biologic evidence contradictory to a postulated association is, however, strong evidence against causality.
The committee did not find it possible to develop an algorithm or decision matrix to apply these six criteria to specific problems. Such an approach would require grading the causal criteria for their relative importance, assigning quantitative weights to the various categories of evidence considered, and standardizing the judgments that are implicit in such models for inferring cause and effect. The validity and utility of standardization in the present report would be severely restricted by the multifactorial etiology of the diseases under review and the limited ability to define, observe, and measure the causal processes involved. As a result, hypotheses about causation were examined case by case, and for each case the application of the criteria depended on the extent and nature of evidence examined.
The strengths and weaknesses of different kinds of clinical, epidemiologic, and laboratory studies and the methodologies for dietary assessment are reviewed above. Accurate assessment of most diet-chronic disease relationships requires that data from studies in humans as well as in animals be evaluated. Ecological correlations of dietary factors and chronic diseases among human populations provide valuable data but cannot be used alone to estimate the strength of the association between diet and diseases. The effect of diet on chronic diseases has been most consistently demonstrated in comparisons of populations with substantially different dietary practices, possibly because it is more difficult to identify such associations within a population whose diet is fairly homogeneous. Thus in general, associations within populations based on case-control and prospective cohort studies underestimate the association. In intervention studies, long exposure is usually required for the effect of diet on chronic disease risk to be manifested. Furthermore, the strict criteria for selecting participants in such studies may result in more homogeneous study samples, which limit the applicability of results to the general population. Despite the limitations of various types of studies in humans, repeated and consistent findings of an association between certain dietary factors and diseases are likely to be real and indicative of a cause-and-effect relationship.
Experiments on dietary exposure of different animal strains can take genetic variability into account and permit more intensive observation. However, extrapolation of data from animal studies to humans is limited by the ability of animal models to simulate human diseases and the comparability of absorption and metabolic phenomena among species. More confidence should therefore
by placed in data derived from studies on more than one animal species or test system, on results that have been reproduced in different laboratories, and on data that indicate a dose-response relationship.
Assessments of the strength of associations between diet and chronic diseases cannot simply be governed by criteria commonly used for inferring causality in other areas of human health. Faced with the special characteristics of studies on nutrients, dietary patterns, and chronic diseases, this committee first assessed the strengths and weaknesses of each kind of study and then evaluated the total evidence against six criteria: strength of association, dose-response relationship, temporally correct association, consistency of association, specificity of association, and biologic plausibility. It assessed the overall strength of the evidence on a continuum from highly likely to very inconclusive. Overall, the strength, consistency, and preponderance of data and the degree of concordance in epidemiologic, clinical, and laboratory evidence determined the strength of the conclusions in this report.
AHA (American Heart Association). 1968. National Diet-Heart Study: Final Report. AHA Monograph No. 18. National Diet-Heart Study Research Group, Executive Committee on Diet and Heart Disease, National Heart Institute. American Heart Association, New York. 428 pp.
Bailar, J.C., III, and F. Mosteller. 1986. Medical Uses of Statistics. Massachusetts Medical Society, Waltham, Mass. 425 pp.
Bailar, J.C., III, E. Crouch, R. Shaikh, and D. Spiegelman. 1988. One-hit models of carcinogenesis: conservative or not? Risk Analysis 8:485-498.
Basiotis, P.P., S.O. Welsh, F.J. Cronin, J.L. Kelsay, and W. Mert. 1987. Number of days of food intake records to estimate individual and group nutrient intakes with defined confidence. J. Nutr. 117:1638-1641.
Bazzarre, T.L, and M.P. Myers. 1980. The collection of food intake data in cancer epidemiology studies. Nutr. Cancer 1: 22-45.
Beaton, G.H., J. Milner, P. Corey, V. McGuire, M. Cousins, E. Stewart, M. de Ramos, D. Hewitt, P.V. Grambsch, N. Kassim, and J.A. Little. 1979. Sources of variance in 24-hour dietary recall data: implications for nutrition study design and interpretation. Am. J. Clin. Nutr. 32:2546-2559.
Beecher, G.R., and J.T. Vanderslice. 1984. Determination of nutrients in foods: factors that must be considered. Pp. 29-55 in K.K. Stewart and J.R. Whitaker, eds. Modern Methods of Food Analysis: IFT Basic Symposium Series. Avi Publishing Co., Westport, Conn.
Beynen, A.C., R.J. Hermus, and J.G. Hautvast. 1980. A mathematical relationship between the fatty acid composition of the diet and that of the adipose tissue in man. Am. J. Clin. Nutr. 33:81-85.
Bingham, S.A. 1987. The dietary assessment of individuals: methods, accuracy, new techniques and recommendations. Nutr. Abstr. Rev. 57:705-741.
Bingham, S., and J.H. Cummings. 1983. The use of 4-aminobenzoic acid as a marker to validate the completeness of 24 h urine collections in man. Clin. Sci. 64:629-635.
Bingham, S.A., and J.H. Cummings. 1985. Urine nitrogen as an independent validatory measure of dietary intake: a study of nitrogen balance in individuals consuming their normal diet. Am. J. Clin. Nutr. 42:1276-1289.
Block, G. 1982. A review of validations of dietary assessment methods. Am. J. Epidemiol. 115:492-505.
Block, G., A.M. Hartman, C.M. Dresser, M.D. Carroll, J. Gannon, and L. Gardner. 1986. A data-based approach to diet questionnaire design and testing. Am. J. Epidemiol. 124:453-469.
Brockman, H.E., and D.E. DeMarini. 1988. Utility of short-term tests for genetic toxicity in the aftermath of the NTP's analysis of 73 chemicals. Environ. Mol. Mutagenesis 11: 421-435.
Burk, M.C., and E.M. Pao. 1976. Methodology for Large-Scale Surveys of Household and Individual Diets. Home Economics Research Rep. No. 40. Agriculture Research Service, U.S. Department of Agriculture, Hyattsville, Md. 88 pp.
Burke, B.S. 1947. The dietary history as a tool in research. J. Am. Diet. Assoc. 23:1041-1046.
Byers, T.E., R.I. Rosenthal, J.R. Marshall, T.F. Rzepka, K.M. Cummings, and S. Graham. 1983. Dietary history from the distant past: a methodological study. Nutr. Cancer 5:69-77.
Carroll, M.D., S. Abraham, and C.M. Dresser. 1983. Dietary Intake Source Data: United States, 1976-1980. Vital and Health Statistics, Series 11, No. 231. DHHS Publ. No. (PHS) 83-1681. National Center for Health Statistics, Public Health Service. U.S. Department of Health and Human Services, Hyattsville, Md. 483 pp.
Chu, S.Y., LN. Kolonel, J.H. Hankin, and . Lee. 1984. A comparison of frequency and quantitative dietary methods for epidemiologic studies of diet and disease. Am. J. Epidemiol. 119:323-334.
Council on Scientific Affairs. 1987. Autopsy: a comprehensive review of current issues. J. Am. Med. Assoc. 258:364-369.
Dawber, T.R. 1980. The Framingham Study: The Epidemiology of Atherosclerotic Disease. Harvard University Press, Cambridge, Mass. 257 pp.
Dawber, T.R., G. Pearson, P. Anderson, G.V. Mann, W.B. Kannel, D. Shurtleff, and P. McNamara. 1962. Dietary assessment in the epidemiologic study of coronary heart disease: the Framingham Study. II. Reliability of measurement. Am. J. Clin. Nutr. 11:226-234.
de Serres, F.J., and J. Ashby, eds. 1981. Evaluation of Short-Term Tests for Carcinogens: Report of the International Collaborative Program. Progress in Mutation Research, Vol. 1. Elsevier/North-Holland, New York. 827 pp.
Dwyer, J.T. 1988. Assessment of dietary intake. Pp. 887-905 in M.E. Shils and V.R. Young, eds. Modern Nutrition in Health and Disease, 7th ed. Lea & Febiger, Philadelphia.
EPA (Environmental Protection Agency). 1982. Pesticide Assessment Guidelines. Subdivision F, Hazard Evaluation: Human and Domestic Animals. Hazard Evaluation Division, Office of Pesticide Programs, Office of Pesticides and Toxic Substances, U.S. Environmental Protection Agency. National Technical Information Service, U.S. Department of Commerce, Springfield, Va. 157 pp.
Fehily, A.M. 1984. Epidemiology for nutritionists. 4. Survey methods. Hum. Nutr. Appl. Nutr. 37:419-425.
Fisher, R.A. 1935. The Design of Experiments, 1st ed. Oliver and Boyd, Edinburgh, Scotland. 252 pp.
Fregly, M.J. 1985. Attempts to estimate soduim intake in humans. Pp. 93-112 in M.J. Horan, M. Blaustein, J.B. Dunbar, W. Kachadorian, N.M. Kaplan, and A.P. Simopoulos, eds. NIH Workshop on Nutrition and Hypertension: Proceedings from a Symposium. Biomedical Information Corp., New York.
Garcia-Palmieri, M.R., P. Sorlie, J. Tillotson, R. Costas, Jr., E. Cordero, and M. Rodriguez. 1980. Relationship of dietary intake to subsequent coronary heart disease incidence: the Puerto Rico Heart Health Program. Am. J. Clin. Nutr. 33:1818-1827.
Garland, B., M. Ibrahim, and R. Grimson. 1982. Assessment of past diet in cancer epidemiology. Am. J. Epidemiol. 116:577.
Gersovitz, M., J.P. Madden, and H. Smiciklas-Wright. 1978. Validity of the 24-hr. dietary recall and seven-day record for group comparisons. J. Am. Diet. Assoc. 73:48-55.
Hankin, J.H., A.M. Nomura, J. Lee, T. Hirohata, and L.N. Kolonel. 1983. Reproducibility of a diet history questionnaire in a case-control study of breast cancer. Am. J. Clin. Nutr. 37:981-985.
Hepburn, F.N. 1987. Food Consumption/Composition Interrelationships. Report No. 382. Human Nutrition Information Service, U.S. Department of Agriculture, Hyattsville, Md.
Hill, A.B. 1971. Principles of Medical Statistics, 9th ed. Oxford University Press, New York.
Isaksson, B. 1980. Urinary nitrogen output as a validity test in dietary surveys. Am. J. Clin. Nutr. 33:4-5.
Jain, M., G.R. Howe, K.C. Johnson, and A.B. Miller. 1980. Evaluation of a diet history questionnaire for epidemiologic studies. Am. J. Epidemiol. 111:212-219.
Lee, J. 1980. Alternate approaches for quantifying aggregate and individual agreements between two methods for assessing dietary intakes. Am. J. Clin. Nutr. 33:956-958.
Lee, J., and L.N. Kolonel. 1982. Nutrient intakes of husbands and wives: implications for epidemiologic research. Am. J. Epidemiol. 115:515-525.
Lee, J., L.N. Kolonel, and J.H. Hankin. 1983. On establishing the interchangeability of different dietary-intake assessment methods used in studies of diet and cancer. Nutr. Cancer 5:215-218.
Lilienfeld, A.M., and D.E. Lilienfeld. 1980. Foundations of Epidemiology, 2nd ed. Oxford University Press, New York. 375 pp.
Liu, K., J. Stamler, A. Dyer, J. McKeever, and P. McKeever. 1978. Statistical methods to assess and minimize the role of intra-individual variability in obscuring the relationship between dietary lipids and serum cholesterol. J. Chronic Dis. 31:399-418.
Madden, J.P., S.J. Goodman, and H.A. Guthrie. 1976. Validity of the 24-hr. recall. Analysis of data obtained from elderly subjects. J. Am. Diet. Assoc. 68:143-147.
Mart, J.W. 1971. Individual dietary surveys: purposes and methods. World Rev. Nutr. Diet. 13:105-164.
Mausner, J.S., and S. Kramer. 1985. Mausner & Bahn Epidemiology-An Introductory Text, 2nd ed. W.B. Saunders, Philadelphia. 361 pp.
McGee, D.L., D.M. Reed, K. Yano, A. Kagan, and J. Tillotson. 1984. Ten-year incidence of coronary heart disease in the Honolulu Heart Program. Relationship to nutrient intake. Am. J. Epidemiol. 119:667-676.
Medlin, C., and J.D. Skinner. 1988. Individual dietary intake methodology: a 50-year review of progress. J. Am. Diet. Assoc. 88:1250-1257.
Mertz, W., and J.L. Kelsay. 1984. Rationale and design of theBeltsville one-year dietary intake study. Am. J. Clin. Nutr. 40 suppl. 6:1323-1326.
Møller-Jenson, O., J. Wahrendorf, A. Rosenqvist, and A. Geser. 1984. The reliability of questionnaire-derived historical dietary information and temporal stability of food habits in individuals. Am. J. Epidemiol. 120:281-290.
Morgan, R.W., M. Jain, A.B. Miller, N.W. Choi, V. Matthews, L. Munan, J.D. Burch, J. Feather, G.R. Howe, and A. Kelly. 1978. A comparison of dietary methods in epidemiologic studies. Am. J. Epidemiol. 107:488-498.
Morris, J.S., M.J. Stampfer, and W. Willett. 1983. Dietary selenium in humans: toenails as an indicator. Biol. Trace Element Res. 5:529-537.
NHLBI (National Heart, Lung and Blood Institute). 1971. Arteriosclerosis: A Report by the National Heart, Lung and Blood Task Force on Arteriosclerosis, Vol. 1. DHEW Publ. No. (NIH) 72-219. National Institutes of Health, Public Health Service, U.S. Department of Health, Education, and Welfare, Bethesda, Md. 365 pp.
Nomura, A., J.H. Hankin, and G.G. Rhoads. 1976. The reproducibility of dietary intake data in a prospective study of gastrointestinal cancer. Am. J. Clin. Nutr. 29:1432-1436.
NRC (National Research Council). 1981. Mammalian Models for Research on Aging. Report of the Committee on Animal Models for Research on Aging, Assembly of Life Sciences. National Academy Press, Washington, D.C. 587 pp.
NRC (National Research Council). 1982. Diet, Nutrition, and Cancer. Report of the Committee on Diet, Nutrition, and Cancer, Assembly of Life Sciences. National Academy Press, Washington, D.C. 478 pp.
Pekkarinen, M. 1970. Methodology in the collection of food consumption data. World Rev. Nutr. Diet. 12:145-171.
Peto, R., M.C. Pike, P. Armitage, N.E. Breslow, D.R. Cox, S.V. Howard, N. Mantel, K. McPherson, J. Peto, and P.G. Smith. 1976. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design. Br. J. Cancer 34:585-612.
Peto, R., M.C. Pike, P. Armitage, N.E. Breslow, D.R. Cox, S.V. Howard, N. Mantel, K. McPherson, J. Peto, and P.G. Smith. 1977. Design and analysis of randomized clinical trials requiring prolonged observation of each patient. II. Analysis and examples. Br. J. Cancer 35:1-39.
Puffer, R.R., and G.W. Griffith. 1967. Patterns of Urban Mortality: Report of the Inter-American Investigation of Mortality. Sci. Publ. No. 151. Pan American Health Organization, Pan American Sanitary Bureau, Regional Office of the World Health Organization, Washington, D.C. 353 pp.
Reshef, A., and L.N. Epstein. 1972. Reliability of dietary questionnaire. Am. J. Clin. Nutr. 25:91-95.
Rohan, T.E., and J.D. Potter. 1984. Retrospective assessment of dietary intake. Am. J. Epidemiol. 120:876-887.
Rush, D., and A.R. Kristal. 1982. Methodologic studies during pregnancy: the reliability of the 24-hour dietary recall. Am. J. Clin. Nutr. 35:1259-1268.
Schofield, W.N. 1985. Predicting basal metabolic rate, new standards and review of previous work. Human Nutr. Clin. Nutr. 39 suppl. 1:5-41.
Shapiro, S.H., and T.A. Louis, eds. 1983. Clinical Trials: Issues and Approaches. Statistics, Textbooks and Mono-
graphs, Vol. 46. Marcel Dekker, New York. 209 pp.
Shekelle, R.B., A.M. Shryock, O. Paul, M. Lepper, J. Stamler, S. Liu, and W.J. Raynor, Jr. 1981. Diet, serum cholesterol, and death from coronary heart disease: the Western Electric Study. N. Engl. J. Med. 304:65-70.
Sorenson, A.W. 1982. Assessment of nutrition in epidemiologic studies. Pp. 434-474 in S. Schottenfeld and J.F. Fraumeni, eds. Cancer Epidemiology and Prevention. W.B. Saunders, Philadelphia.
Todd, K.S., M. Hudes, and D.H. Calloway. 1983. Food intake measurement: problems and approaches. Am. J. Clin. Nutr. 37:139-146.
USDA (U.S. Department of Agriculture). 1984. Nationwide Food Consumption Survey. Nutrient Intakes: Individuals in 48 States, Year 1977-78. Report No. 1-2. Consumer Nutrition Division, Human Nutrition Information Service, Hyattsville, Md. 439 pp.
USDA (U.S. Department of Agriculture). 1987. Nationwide Food Consumption Survey. Continuing Survey of Food Intakes of Individuals. Women 19-50 Years and Their Children 1-5 Years, 4 Days, 1985. Report No. 85-4. Nutrition Monitoring Division, Human Nutrition Information Service, Hyattsville, Md. 182 pp.
Van Leeuwen, F.E., H.C.W. de Vet, R.B. Hayes, W.A. van Staveren, C.E. West, and J.G.A.J. Hautvast. 1983. An assessment of the relative validity of retrospective interviewing for measuring dietary intake. Am. J. Epidemiol. 118: 752-758.
Weisburger, J.H., and G.M. Williams. 1981. Carcinogen testing: current problems and new approaches. Science 214: 401-407.
White, C., and J.C. Bailar. 1956. Retrospective and prospective methods of studying association in medicine. Am. J. Public Health 46:35-44.
Willett, W.C., L. Sampson, M.J. Stampfer, B. Rosner, C. Bain, J. Witschi, C.H. Hennekens, and F.E. Speizer. 1985. Reproducibility and validity of a semiquantitative food frequency questionnaire. Am. J. Epidemiol. 122:51-65.
Young, C.M., and M.F. Trulson. 1960. Methodology for dietary studies in epidemiological surveys. II. Strengths and weaknesses of existing methods. Am. J. Public Health 50:803-814.
Young, C.M., G.C. Hagan, R.E. Tucher, and W.D. Foster. 1952. A comparison of dietary study methods. 2. Dietary history vs. seven-day record vs. 24-hour recall. J. Am. Diet. Assoc. 28:218-221.