A Assessment Instruments of Relevance to Obesity Treatment
Appendix A discusses instruments assessing psychological and behavioral factors, diet, and physical activity. Most of these instruments are not meant to be used in weight-management programs because they are impractical for such purposes or are geared towards use in clinical and field settings by researchers familiar with their proper administration. In particular, the wide variety of assessment instruments for psychological and behavioral factors and for physical activity are published here for the first time. As researchers and policy developers work towards providing weight-management programs with more practical tools to assess the psychosocial health, diet, and activity patterns of their clients and potential clients, the tools presented here should provide a useful starting point.
ASSESSING PSYCHOSOCIAL AND BEHAVIORAL ASPECTS OF OBESITY AND WEIGHT LOSS
Regardless of the genetic and environmental factors that cause obesity, the psychological significance of this disease is important. Obese individuals who join comprehensive weight-loss programs experience major physiological changes, undertake major changes in lifestyle, and attempt to curtail a strongly appetitive behavior. Therefore, the psychological consequences of dieting and weight loss are important to examine (O'Neil and Jarrell, 1992). This section of Appendix A describes assessment instruments used in the related areas of obesity and psychopathology,
behavioral factors in dieting, and psychological effects of dieting and weight regain. These instruments are for use by health-care providers with appropriate training in psychometric assessment and familiarity with a particular test being administered.
Obesity and Psychopathology
For many years, obesity was considered a disorder with pronounced psychological underpinnings. The obese were considered to overeat in response to negative feelings such as insecurity, sadness, and frustration, or more particularly because of their inability to establish positive interpersonal relationships. More recently this traditional view has changed dramatically on the basis of new research findings. If obesity were caused by psychological dysfunction, one would expect to find a higher incidence of psychopathology among obese individuals compared to nonobese controls. Population-based studies have not found this to be the case (Wadden and Stunkard, 1985). These results do not show that obese patients have greater problems than any other medical or surgical patients. In addition, there do not appear to be personality characteristics unique to obese individuals (Leon and Roth, 1977).
Rather than a cause, psychopathology in the obese is now seen as a consequence of the prejudice and discrimination they face because of their weight (Wadden and Stunkard, 1993). This social stigmatization and discrimination experienced by obese individuals often leads to negative self-esteem and an unfavorable body image. Although enhanced self-esteem and a more positive body image may result from weight loss, these same factors may keep some people from entering treatment in the first place. Thus, two general psychological factors appear to be important in the assessment of obese individuals before treatment: self-esteem and body image.
In one longitudinal study of obese and nonobese children, Klesges (in press) reported that physical self-esteem was consistently related to the development of obesity, although it accounted for less than 5 percent of the variance in changes in body fat. Wadden and Foster (1992) note that feelings of guilt and shame over one's inability to control weight are likely to diminish the self-esteem of some obese persons. Patients losing weight, on the other hand, often display increased self-confidence and self-esteem. According to O'Neil and Jarrell (1992), increased self-esteem and reduced social discrimination may lead to more assertiveness and optimism and a willingness to make other life changes, even those that
involve some personal risk. Two measures of self-esteem include the Tennessee Self-Concept Scale and the Rosenberg Self-Esteem Scale. One measure of well-being is the Well-Being General Schedule.
Tennessee Self-Concept Scale This scale is self-administered and contains 100 items, each of which is rated on a 5-point Likert scale ranging from "completely false" to "completely true" (Fitts, 1964). It measures three internal dimensions (Identity, Self-Satisfaction, and Behavior) and five external dimensions (Physical Self, Moral-Ethical Self, Personal Self, Family Self, and Social Self) of self-concept. This scale takes about 25 minutes to complete. Reliability coefficients range from 0.66 to 0.94. Test-retest reliabilities are reported from 0.60 to 0.92. Concerning validity, correlations with the Coopersmith Self-Esteem Inventory range between 0.64 and 0.75 (Archaumbault, 1992; Dowd, 1992).
Rosenberg Self-Esteem Scale The Rosenberg Scale provides a global measure of self-esteem (Goldman and Osborne, 1985; Rosenberg, 1965, 1979). Subjects indicate their agreement with statements about their perceived worth and confidence. The scale consists of 10 items, each of which is rated on a 4-point Likert scale ranging from "strongly agree" to "strongly disagree." The reliability coefficient alpha has been reported at 0.75, with a test-retest reliability of 0.85 (Goldman and Mitchell, 1990).
General Well-Being Schedule This schedule measures subjective well-being and distress in the previous month (McDowell and Newell, 1987; NCHS, 1977). The self-report scale assesses how the subject feels about his inner personal state rather than about external conditions. The six dimensions covered are Anxiety, Depression, General Health, Positive Well-Being, Self-Control, and Vitality. The scale consists of 18 items. Fourteen items have 6-point responses that vary by item content; four items on different feeling dimensions have an 11-point response format. Higher scores indicate that the person does not have a sense of well-being. The reliability alpha coefficient ranges from 0.91 to 0.95. Test-retest reliability coefficients range from 0.68 to 0.85. Correlations among the subscale dimensions range from 0.16 to 0.72. The scale also shows good correlational validity with interviewer's ratings of depression and other depression and anxiety scales (Fazio, 1977). See Appendix B for the full test.
Some obese people suffer intense negative feelings about their bodies. Stunkard and Mendelson (1967) point to a pattern sometimes found in childhood or adolescent obesity in which subjects view their bodies as
grotesque or loathsome, believing that others look at them with horror and contempt. The problem of body image disparagement is particularly striking in young Caucasian women of upper-middle socioeconomic status, in which the prevalence of obesity is very low but the sanctions against it are very high (Sobal and Stunkard, 1989). Body image represents the cognitive perception of one's body size and appearance along with the emotional response to these perceptions. Four measures of body image include the Body-Cathexis and Self-Cathexis Scale, Body Satisfaction Scale, Body Shape Questionnaire, and the Body Parts Satisfaction Scale.
Body-Cathexis and Self-Cathexis Scale This is a self-administered scale that consists of two parts (Secord and Jourard, 1953). The first asks subjects to rate 46 body parts and functions on a 5-point scale ranging from "Have strong feelings and wish change could somehow be made" to "Consider myself fortunate." The second part concerns self cathexis and lists 55 items that represent conceptual aspects of the self, which are rated on the same scale as the first part. Three scores are obtained: Total Body Cathexis (BC), Total Self Cathexis (SC), and an Anxiety Indicator score based on the BC items most negatively perceived by each sex. Reliability on the BC is given at 0.81, on the SC at 0.90, and on the Anxiety Indicator at 0.72. Intercorrelations between the BC and SC scores are significant, and the scale as a whole has been shown to be a stable measure, with a test-retest reliability coefficient of 0.87 (Tucker, 1981).
Body Satisfaction Scale The Body Satisfaction Scale (Slade et al., 1990) is based on the Body Cathexis Scale (Secord and Jourard, 1953). This simple self-administered scale assesses satisfaction/dissatisfaction with 16 body parts and is sensitive to eating disorders and obesity. The items are rated on a 7-point scale ranging from "very satisfied" to "very unsatisfied." Higher ratings indicate greater body dissatisfaction. Three summative scales (General, Head Parts [above the neck], and Body Parts [below the head]) are derived from the scale. It takes 2–3 minutes to complete. Internal consistency alpha coefficients for the three summative scales range from 0.79 to 0.89. The Body Satisfaction Scale has been shown to correlate positively with the Body Shape Questionnaire (Cooper et al., 1987).
Body Shape Questionnaire This questionnaire is a self-report instrument that measures concerns about body shape, particularly the experience of feeling fat (Cooper et al., 1987). A total score is calculated from its 34 items, which are rated on a 6-point Likert scale ranging from "never" to "always." The scale refers to the subject's state over the previous four
weeks. The questionnaire is simple to complete (about 10 minutes). Both concurrent and discriminant validity have been shown to be good.
Body Parts Satisfaction Scale This scale, which assesses satisfaction with the body, consists of 24 items on body parts as well as an overall appearance item (Berscheid et al., 1973). The items are rated on a 6-point Likert scale from "extremely dissatisfied" to "extremely satisfied." The reliability alpha coefficient has been reported as 0.89 (Noles et al., 1985).
Behavioral Factors in Dieting
Disordered eating behavior is sometimes associated with dieting and weight loss. Although no evidence supports an "obese eating style" (O'Neil and Jarrell, 1992), there is increasing evidence to suggest that dieting may increase the incidence of eating disorders, particularly binge eating. First identified by Stunkard (1959), binge eating is characterized by eating a large amount of food in a short period of time, followed by severe discomfort and self-condemnation. Binge eating appears to be prevalent among the obese: estimates among obese individuals seeking treatment range from 23 percent to 82 percent (Loro and Orleans, 1981). We describe seven different measures of disordered eating (Eating Disorders Inventory, Eating Inventory/Three-Factor Eating Questionnaire, Eating Attitudes Test, Eating Disorder Examination, Questionnaire on Eating and Weight Patterns, and the Stanford Eating Behavior Questionnaire), including one designed specifically to assess binge eating (Binge Eating Scale).
Eating Disorders Inventory The Eating Disorders Inventory (Garner, 1991) is a 64-item self-report test to measure cognitive and behavioral characteristics of anorexia and bulimia nervosa. It consists of eight subscales: Drive for Thinness, Bulimia, Body Dissatisfaction, Ineffectiveness, Perfectionism, Interpersonal Distrust, Interceptive Awareness, and Maturity Fears. The items are rated on a 6-point scale from "always" to "never," and subscale scores are the total of all item scores for that particular subscale. Reliability coefficients for the subscales range from 0.65 to 0.91, and convergent and discriminant validity are established in subjects with anorexia and bulimia. The validity of the scale in an obese population has not been established, but it may help to identify characteristics associated with binge eating in the obese (Lowe and Caputo, 1991).
Eating Inventory/Three-Factor Eating Questionnaire This self-report questionnaire yields three dimensions of eating behavior: cognitive control
of eating behavior, disinhibition of control, and susceptibility to hunger (Stunkard and Messick, 1985). The questionnaire is divided into two parts, with the first consisting of 36 true-false items and the second consisting of 15 rated items. The Eating Inventory takes approximately 15 minutes to complete. Reliability alpha coefficients for the three factors range from 0.85 to 0.93.
Eating Attitudes Test (EAT) The EAT (Garner and Garfinkel, 1979; Garner et al., 1982) is available as both a 40-item and 26-item measure of the symptoms of eating disorders. It can be used to identify subjects who are experiencing abnormal eating patterns that interfere with normal psychosocial functioning. The multifactorial 26-item, 6-point Likert scale correlates highly with the original 40-item scale (r = 0.98). Three factors, or clusters of items, arise from the EAT: Dieting (avoidance of fattening foods and preoccupations with shape), Bulimia and Food Preoccupation, and Oral Control (items reflecting self-control about food and social pressure regarding weight). Reliability correlation alpha for the EAT-26 is 0.90. The EAT displays acceptable criterion-related validity.
Eating Disorder Examination (EDE) The EDE (Cooper and Fairburn, 1987) is a 62-item semistructured clinical interview for assessing the specific psychopathology of eating disorders, including concerns about shape and weight. The exam is designed to assess the present state of patients, and all questions refer to the previous four-week period of time. Each item has at least one mandatory probe question and optional subsidiary questions. Most ratings are made on a 7-point scale, either in terms of severity or in terms of frequency of occurrence at a defined level of severity. The interview takes between 30 minutes and one hour to complete. Inter-rater reliability of all EDE items is uniformly high. The interview is not intended for use as a diagnostic instrument but as a research measure providing a comprehensive profile of the characteristic psychopathological features of patients with eating disorders.
Questionnaire on Eating and Weight Patterns (QEWP) The QEWP was developed to assess binge eating disorder in large study groups and was used in two multisite field trials establishing the prevalence of binge eating disorder (Spitzer et al., 1992, 1993). The questionnaire contains items regarding demographics, frequency and duration of binge eating, compensatory behaviors for weight control, degree of distress regarding binge eating, and the presence of accompanying behavioral indicators of loss of control. All questions about current functioning and eating behavior focus on the previous six months. Higher scores on the GEWP are correlated
with binge eating and a higher prevalence of psychiatric comorbidity (Yanovski et al., 1992, 1993).
Stanford Eating Behavior Questionnaire This is an extensive self-report questionnaire used to collect information on subjects' demographics, weight history, eating patterns, medical history, psychiatric history, and family history (Agras, 1987). It also contains 10 items that specifically address binge eating behavior.
Binge Eating Scale This scale is a 16-item, self-report questionnaire that identifies individuals with a spectrum of binge eating difficulties (Gormally et al., 1982). The scale describes both behavioral manifestations (e.g., eating large amounts of food) and feelings surrounding a binge episode (e.g., guilt, fear of being unable to control what or how much one is eating). It is scored by summing the individual weights for each item, with high scores indicating more severe binge-eating problems. The scale is useful in distinguishing levels of binge-eating severity and appears to have high internal consistency. It also correlates well with clinical interviews for diagnosing disordered eating behaviors.
Psychological Effects of Dieting
Psychological factors contribute to dieting behavior in several ways. First, variables such as dieting readiness and self-efficacy may be important in terms of whether or not an obese individual will seek treatment to lose weight. Second, dieting itself may affect mood, and mood in turn may influence dieting efforts. Third, factors such as stress and social support may have an impact on dieting behavior. Finally, dieting programs should also affect the participant's cognitive knowledge base about obesity, nutrition, exercise, and the pros and cons of dieting.
Dieting Readiness and Self-Efficacy
Scales have been developed to assess a patient's readiness for embarking on a weight-loss program; they assess items such as goals and attitudes, exercise patterns and attitudes, and hunger and eating cues. One scale, the Restraint Scale, has been proposed as a measure of unsuccessful attempts at dieting (Herman and Polivy, 1984). It is a 10-item scale that addresses two factors, weight fluctuation and dietary restraint (the latter concept first proposed by Herman and Mack ). A more widely used measure of readiness for dieting is the Dieting Readiness Scale described below.
Self-efficacy is a concept closely related to diet readiness. It refers to
an individual's subjective estimate of his or her ability or capacity to engage in specific dieting behaviors and exercise, or to cope with high-risk situations for relapse or weight regain. Such measures are important for measuring change over treatment (e.g., does treatment enhance self-efficacy and thereby increase success at long-term weight loss?) and to implement treatment matching or relapse prevention. Five measures of self-efficacy for eating behaviors and exercise are the Dieter's Inventory of Eating Temptations, Self-Efficacy for Eating Behaviors Scale, Self-Efficacy for Exercise Behaviors Scale, Physical Self-Efficacy Scale, and the Exercise Specific Self-Efficacy Scale.
Dieting Readiness Scale This scale assesses whether a person is prepared to undertake a diet at the point when they decide to begin a new attempt at weight loss (Brownell, 1990). The scale is divided into six sections that assess attitudes toward weight loss: Goals and Attitudes, Hunger and Eating Cues, Control Over Eating, Binge Eating and Purging, Emotional Eating, and Exercise Patterns and Attitudes. Each section contains its own questions, which are rated on a 5-point Likert scale, and its own scoring key. Since each section is scored separately, the individual sections may help subjects identify strengths and weaknesses in their weight-loss attitudes that could affect success. Because the scale is fairly new, reliability and validity have not been established. See Appendix B for the full test.
Dieter's Inventory of Eating Temptations (DIET) This self-report inventory is designed to assess behavioral competence in six situations related to weight control: Overeating, Negative Emotions, Exercise, Resisting Temptation, Positive Social, and Food Choice (Schlundt and Zimering, 1988). Low competence in these situations indicates a lack of self-control. Obese subjects rate themselves as less competent than normal-weight subjects in the overeating, negative emotions, and exercise situations. The inventory correlates with self-reported eating and control behaviors. Reliability coefficient alphas for the six DIET scales range from 0.68 to 0.93. Test-retest reliability correlations for the scales range from 0.81 to 0.96.
Self-Efficacy for Eating Behaviors Scale This scale consists of 89 items and asks the subject to ''please rate how confident you are that you could really motivate yourself to do things like these consistently, for at least six months" (Sallis et al., 1987). Ratings are made on a 5-point Likert scale, with responses ranging from "Sure I could not do it" to "Sure I could do it," with a response option for "does not apply." The scale has five subscales: Resisting Relapse, Reducing Calories, Reducing Salt, Reducing Fat, and Behavioral Skills. The alpha coefficients for internal consistency on the subscales range from 0.85 to 0.93. Test-retest reliabilities
ranged from 0.43 to 0.64. All self-efficacy for eating factors are correlated significantly with reported "heart-healthy" health habits.
Self-Efficacy for Exercise Behaviors Scale The scale consists of 49 items and asks the subject to "please rate how confident you are that you could really motivate yourself to do things like these consistently, for at least six months" (Sallis et al., 1987). Ratings are made on a 5-point Likert scale, with responses ranging from "Sure I could not do it" to "Sure I could do it," with a response option for ''does not apply." The scale has two subscales: Resisting Relapse and Making Time for Exercise. The alpha coefficients for internal consistency on the subscales range from 0.83 to 0.85. Test-retest reliability is 0.68 for both subscales. Both exercise self-efficacy factors are correlated significantly with reported participation in vigorous activity.
Physical Self-Efficacy Scale The Physical Self-Efficacy Scale consists of a 10-item Perceived Physical Ability (PPA) subscale and a 12-item Physical Self-Presentation Confidence (PSPC) subscale (Ryckman et al., 1982). Higher scores on the PPA indicate higher perceived physical ability, and higher scores on the PSPC reflect greater confidence in presentation of physical skills. The scores on the two subscales can be summed into an overall Physical Self-Efficacy (PSE) score. Higher values on the PSE indicate a stronger sense of physical self-efficacy. Reliability alphas are 0.84 for the PPA, 0.74 for the PSPC, and 0.81 for the PSE. Test-retest reliability and convergent, concurrent, discriminant, and predictive validity on the PSE and its subscales are good.
Exercise Specific Self-Efficacy Scale This scale assesses perceived capabilities to exercise three times per week in the face of barriers to participation (McAuley, 1992; McAuley and Jacobson, 1991). These barriers were determined through an attributional analysis of reasons for dropping out of exercise. Sample items include the subjects' belief in ability to exercise regularly if they failed to make progress quickly enough, exercise conflicting with other activities such as work, being bored with the exercise activity, and feeling self-conscious about their appearance. The reliability alpha coefficient for the scale is 0.88. The scale helps to identify possible psychological mechanisms influencing the adoption and maintenance of exercise behavior.
Dieting and Mood
Although one early review examining mood changes during weight reduction noted a high incidence of negative emotional responses
(Stunkard and Rush, 1974), more recent reviews (Wing et al., 1984) do not find increases in measures of depression or anxiety as a result of weight loss. Perhaps this is because later studies are often more group-oriented and provide emotional buffers or social support. Nonetheless, some patients may experience problem mood states, such as depression, that need to be assessed. Although few studies have examined the psychological consequences of regaining weight, Brownell and Stunkard (1981) found that although patients' depression scores decreased as they lost weight, depression levels rose for those patients who regained weight during a one-year follow-up period. The Beck Depression Inventory and the Hamilton Psychiatric Rating Scale for Depression are two frequently used measures of depression.
Beck Depression Inventory (BDI) BDI is used to detect possible depression and to assess its severity (Beck et al., 1961, 1988). It measures cognitive, affective, somatic, and performance-related symptoms of depression. BDI can be self-administered or administered orally and takes 5–15 minutes to complete. The scale consists of a total score of 21 items, or sets of statements, answered on a 0-to-3 scale of severity of depressive problems. The possible score range is from 0 to 63, with higher scores indicating greater severity of depression. The subject is asked to consider feelings in the last week. The internal consistency coefficient alpha for the BDI ranges from 0.73 to 0.95. Test-retest correlations range from 0.48 to 0.86. Discriminant validity has been reported to be fairly strong, and the scale has been shown to correlate well with biological and somatological issues, suicidal behaviors, alcoholism, adjustment, and life crisis. Concurrent validity studies show correlations between 0.60 and 0.76 with clinical ratings and other depression scales (Conoley, 1992; Sundberg, 1992).
Hamilton Psychiatric Rating Scale for Depression This scale consists of 17 items presented by an interviewer in a semistructured interview (Hamilton, 1960). Items are scored 0–2 or 0–4 to reflect increasing severity of the symptom. Scores are totaled, and higher scores indicate more severe depression. Inter-rater reliability has ranged between 0.80 and 0.90 (Goldman and Mitchell, 1990).
Stress and Social Support
Untoward life events often have a substantial impact on obese patients' psychosocial functioning and their ability to control weight. Such events include loss of a family member and work, financial, or health problems. Four examples of measures designed to assess major life events and social readjustment are the Social Readjustment Rating Scale, Life
Experiences Survey, Recent Life Changes Questionnaire, and the Life Events Checklist.
Social Readjustment Rating Scale This is a self-administered scale that rates the number, type, and magnitude of stressful life events and determines relationships between life stress and indices of health and adjustment (Holmes and Rahe, 1967). Each of the 43 items was constructed to contain life events whose occurrence either indicates or requires a major change in the life pattern of the individual.
Life Experiences Survey This survey (Heilbrun, 1984; Johnson and Sarason, 1979) evolved from the Social Readjustment Rating Scale but offers two new features. It allows for separate positive and negative stress scores and requires the subject to rate the degree of impact of any relevant event on a 4-point scale from "no impact" to "extremely negative or positive." The survey includes 57 items: 47 specific events for all respondents and 10 items designed primarily for students. The scale inquires about events experienced during the previous year. The negative stress score is the sum of ratings for all events identified as involving undesirable stress. The positive stress score is calculated in the same way for events perceived as having positive impact. Test-retest correlations show moderate reliability (0.56 to 0.88) for the negative stress score but lower reliability (0.19 to 0.53) for the positive stress score.
Recent Life Changes Questionnaire This questionnaire is a 55-item scale that provides a method of measuring life change (Rahe, 1975). The questionnaire includes the five categories of Health, Work, Home and Family, Personal-Social, and Financial. Subjects indicate whether they experienced various life changes over the previous two years and when these changes occurred. Reliability has been given as 0.84 and correlation with a schedule of recent experiences was 0.67 (Goldman and Mitchell, 1990).
Life Events Checklist The Life Events Checklist (Bhagat et al., 1985) measures total life stress. It consists of 83 items, and subjects use a 7-point scale from -3 to +3 to rate the degree of positive or negative impact that each event had on their life in the previous three years. Seventy-eight events are listed on the scale, and five items are left blank so that the subject can fill in unique events not included in the checklist. The areas covered by the checklist include work, finances, legal matters, social activities, residence, children, family, health, love, and marriage. Reliability alpha coefficients range from 0.53 to 0.77 (Goldman and Mitchell, 1990).
The impact of stress may be moderated by the patient's access to significant social support in the environment. Social support may be derived
from participation in group weight-loss programs (particularly if they allow time to process personal problems), or may come from friends or family members. O'Reilly and Thomas (1989) have reviewed social support measures in health behavior research. They synthesized a measure derived from previous research that is specific to risk-reduction efforts. Results show their measure predicts health behavior maintenance. This questionnaire takes about 10–20 minutes to complete.
Sallis et al. (1987) have developed scales for measuring social support for diet and exercise. Two diet support scales (Family Support for Eating Scale and Friend Support for Eating Scale) and two exercise support scales (Family Support for Exercise Scale and Friend Support for Exercise Scale) are available. Subjects rate the frequency with which both family and friends had said or done what was described in the item during the previous three months. Items are rated on a 5-point scale, ranging from "none" to "very often." These scales have good reliability and are correlated with self-reported exercise and diet habits, providing evidence of concurrent criterion-related and construct validity.
In addition to the goal of weight loss itself, programs can be designed to teach patients more about the nature of obesity, nutrition, exercise, health risks, and the pros and cons of dieting. Such knowledge may have beneficial long-term effects on maintenance of weight loss and future attempts to lose weight. An example is the Adult and Child Behavior Knowledge Scales.
Adult and Child Behavior Knowledge Scales These scales measure knowledge of health behaviors related to cardiovascular disease (Vega et al., 1987). They focus on behavioral capability rather than on the link between behavior and disease. The subscales assess knowledge of dietary sodium, dietary fat, and exercise. The Adult Knowledge Scale consists of 18 multiple-choice and true-false items (six in each of the three subscales) and has a reliability alpha coefficient of 0.80. Test-retest reliability for the total score is also acceptable at 0.76. The Child Knowledge Scale consists of nine multiple-choice and true-false items (three in each of the three subscales) and has a reliability alpha coefficient of 0.51. Test-retest reliability for the total score is 0.73. The Health Behavior Knowledge Scales are useful in assessing differences in knowledge levels among adults and children of differing cultural and language groups.
There are several dietary assessment tools for quantitating energy and nutrient intakes. These include repeated 24-hour dietary recalls with trained interviewers, food records for varying lengths of time, diet histories, food frequency questionnaires and checklists, calorie counters, and forms for monitoring intake of various food groups. Recently, computerized programs have been developed to greatly expedite data analysis and provide immediate feedback.
In general, dietary assessment methods can be differentiated between those needed for research purposes to survey a population and those used in clinical settings for assessing usual intake and adequacy, choosing interventions, and measuring compliance and/or desired change. A critical need still exists, however, for a validated and simplified instrument that will reliably and quickly help us to assess not only the quantitative value of diets with regard to nutrients and change in energy intake, and their dietary adequacy with regard to food groups and dietary balance, but also the qualitative aspects of eating patterns and specific behaviors having long-term importance for managing food intake. Insofar as treatment programs are accountable for their dietary claims, the ability to assess and monitor dietary changes at critical points during a program will be important to its success.
Many weight-management programs have clients keep food records as a self-monitoring strategy. However, if these records are to be useful in evaluating the overall effectiveness of a diet intervention, care must be taken to properly instruct clients in how to keep accurate records and periodically evaluate their skills at this task. Future studies should give further consideration to this potential of combining intervention and monitoring with ongoing assessment efforts. For a recent review of dietary assessment methods, see Buzzard and Willett, 1994.
Measures of Dietary Assessment
A wide variety of questionnaires are available to assess an individual's past and current dietary patterns, food likes and dislikes, use of dietary supplements, presence of food allergies, and general knowledge and attitudes about food and nutrition issues (Dwyer, 1994; St. Jeor, in press; Willett, 1990). The results of these assessments help an individual to set realistic dietary goals and develop interventions to achieve them.
In typical 24-hour dietary recalls, a trained interviewer uses a standardized protocol and props (e.g., food models, cups, and spoons) to prompt a respondent to recall all foods and beverages consumed over a 24-hour period, as well as the specific amounts consumed and, as appropriate, the preparation methods. The recall takes about 15–30 minutes to administer but considerably more time to analyze. It is a more objective assessment tool than a dietary questionnaire, providing a quantitative assessment of energy and nutrient intakes on one day as well as qualitative insight into the foods eaten and timing of food intake. Dietary recalls are generally used to assess the dietary intakes of groups rather than individuals. The number of days needed to obtain accurate data on nutrient intakes varies by nutrient. A minimum of three 24-hour recalls on nonconsecutive days is generally recommended to assess the energy and nutrient intake of an individual. Recalls are useful in clinical settings. A major problem with their use is that individuals can under- or over report their food intake. In addition, a provider cannot be certain that even several days of 24-hour recalls can represent one's typical dietary pattern. Useful references pertaining to 24-hour recalls include Basiotis et al. (1987), Dwyer (1994), and Tarasuk and Beaton (1991).
Dietary Records and Diaries
With these assessment tools, a client records everything he or she eats and drinks for a specified period of time, such as 3–4 days, 1 week, or periodically over a longer time. The client is usually asked to record information about mood and behavior at the time food is consumed. As with 24-hour recalls, dietary records and diaries provide a quantitative assessment of energy and nutrient intakes (with accuracy increasing the longer they are kept) and a qualitative assessment of food-related behaviors. To assess the diet of an individual, a 7-day record is recommended, to include both weekdays and the weekend. In practice, however, 3–4-day records are kept to minimize the time and costs of analysis. Dietary records and diaries are the most objective methods available for evaluating dietary patterns. For further information, see Dwyer (1994), St. Jeor et al. (1983), and Willett (1990).
Food frequencies are lists of frequently consumed foods (typically 100–130 items). Respondents administered a food frequency list are expected to note the frequency with which they consume each food over a
specified period of time (e.g., one consumed broccoli approximately three times per week in the course of the month). With this assessment tool, a typical nutrient intake of an individual can be determined. Food frequencies are quick to administer and not very tedious to analyze. They tend to be used in research studies. Correlation coefficients with food records for varying lengths of time range of 0.4 to 0.7. For further information, see Block (1982), Dwyer (1994), Longnecker et al. (1994), and Willett (1990).
Food lists are specialized tools in which foods are listed in various groupings by category (e.g., dairy foods) or dietary constituent (e.g., fat- or fiber-containing foods) to learn subjects' intake of particular kinds of foods or nutrients. Information on portion sizes, frequency of consumption, and method of preparation within each group is also obtained. Food lists are useful in the clinical setting as a quick and inexpensive means of assessing diets and identifying areas where the subject might benefit from dietary counseling.
An increasing amount of computer software is available containing information on the nutrient composition of foods. Features and price vary among programs. Some are available to the public for self-assessments, while the more sophisticated and feature-laden programs are marketed to health-care providers. The programs are used to calculate the nutrient content of a person's diet, usually comparing nutrient intake to a reference such as Recommended Dietary Allowances (RDAs); they are of particular benefit because they can perform the calculations in seconds and provide immediate feedback on the results. The size of the nutrient database and data-analysis capabilities of the programs vary tremendously, so the validity and reliability of these programs vary.
ASSESSING PHYSICAL ACTIVITY
Health behaviors are difficult to measure, and physical activity is no exception. Existing methods, while relatively crude, nonetheless provide valid and reliable estimates of participation in physical activity and total energy expenditure. The most valid methods of physical activity assessment (doubly-labeled water, direct calorimetry, individual observation, and electronic monitoring) are complicated, technically daunting, intrusive, and expensive, and they are not feasible for use in weight-management
programs. Questionnaire assessments of physical activity are the most widely used technique in clinical and epidemiological studies.
Of the variety of questionnaires available to assess physical activity (Blair et al., 1985; Paffenbarger et al., 1993; Sallis et al., 1985; Taylor et al., 1978; Wilson et al., 1986), most are self-administered, although some require a trained interviewer. Simple one- or two-item questionnaires provide a quick classification of low, moderate, and high activity levels, and these methods show reduced risk of morbidity and mortality in the more active individuals (Lindsted et al., 1991). These simple questionnaires, while useful in epidemiological studies of activity and health, are not sufficiently precise to estimate energy expenditure and changes in energy expenditure for individuals in weight-management programs. To be useful for weight-management programs, questionnaires need to provide estimates of total energy expenditure. In themselves, the frequency of exercise sessions and the intensity of the activity are relatively unimportant for weight loss. The most important aspect is the total amount of activity.
There are questionnaires that do provide valid estimates of total energy expenditure (TEE), or at least the physical activity component of TEE (Blair et al., 1985; Paffenbarger et al., 1993; Sallis et al., 1985). These methods typically involve obtaining estimates of the amount of time individuals spend performing activities of various intensities and using these data to calculate total energy expenditure. Major sources of error are inaccurate recall or recording of time intervals and underestimating the intensity of the effort.
The MET Approach
The concept of metabolic-equivalent units (METs) is a useful way to estimate the intensity of physical activity. A MET is the energy expended during quiet seated rest, generally considered to be an oxygen uptake of 3.5 ml per kg of body weight per minute. This is approximately 1 kcal per kg per hour. Energy expenditure of other physical activities can be expressed as multiples of this resting energy expenditure (working metabolic rate divided by resting metabolic rate). Thus, an activity requiring a threefold increase in metabolism (walking at 3 mph, for example) would be classified at 3 METs and, for a 70-kg person, would result in an energy expenditure of 210 kcal per hour.
Rates of energy expenditure have been verified for many activities, usually by indirect calorimetric methods. Estimates of energy expenditure for activities where the body is moved through space at a constant rate (such as walking, running, or stair climbing at different speeds) are reasonably accurate. Although sport, recreational, household, and occupational
activities can be typically performed over a broad range of intensities (e.g., competitive singles tennis versus casual weekend doubles play), common activities as performed by most individuals can nonetheless be assigned to a category of MET values with some confidence. For example, most housework is in the range of 2 to 5 METs, and light office work typically requires 1.5 to 3 METs. Jogging, cycling, and sports involving running, such as basketball or soccer, demand an average of 6 to 12 METs, depending on skill and fitness levels of the participant. An extensive list of the MET cost of common physical activities is available (Ainsworth et al., 1993).
It also is necessary to obtain information on time spent in physical activities in order to calculate daily or weekly energy expenditure. The typical method is to multiply the time spent in the activity (in hours) by the MET value of the activity times the body weight (in kg) of the individual, which yields an estimate of kilocalories expended over the period. It is not necessary to ask individuals to account for every waking minute when assessing their physical activity. Rather, it is sufficient to identify and measure the amount of time they spend doing activities that result in energy expenditure above a moderate level, perhaps 3 METs or more. This level has significance inasmuch as laborsaving devices make it possible to spend virtually the entire day in activities requiring 3 METs or less, which leads to the relatively low levels of TEE prevalent in the United States.
All of the several physical activity questionnaires that provide estimates of energy expenditure, either for voluntary physical activity or for TEE (Blair et al., 1985; Paffenbarger et al., 1993; Sallis et al., 1985; Taylor et al., 1978), use some variation of the MET approach described above. These methods are acceptable for use in population-based studies and for clinical applications. Used properly, they can enhance obesity treatment programs by providing an assessment of the energy expenditure side of the energy-balance equation. The concepts involved are simple and can be understood by most participants with a minimum of instruction.
In addition to the importance of assessing energy balance in order to evaluate treatment outcome, the MET approach is useful in the physical activity intervention aspect of the weight-management program. Most individuals entering obesity treatment are already quite familiar with the caloric value of food, and teaching them about the caloric value of physical activity is not difficult. The multiples of resting energy expenditure method (the MET approach) is easily communicated by the example that even strolling (at 2 mph) doubles energy expenditure (to 2 METs), and brisk walking (at 3–4 mph) triples or quadruples it (to 3 or 4 METs).
The remainder of this appendix presents an annotated list of references
for studies using most of the principal methods now available for measuring physical activity.
Measures of Activity*
As stated earlier, questionnaire assessments are the most widely used method for measuring physical activity in epidemiological studies. Unfortunately, whereas diets are assessed periodically in many weight-management programs, physical-activity assessments are not nearly as common. Both diet and exercise should be evaluated routinely. References are given here for the best known and most thoroughly validated questionnaires that are applicable to large population studies. Specific papers on activity assessment in children and elderly individuals are included. Several major review papers on physical activity assessment are included on the list of references. These reports can provide an overview of issues related to the topic of activity assessment, and also present a broad array of additional techniques and methods. Physical activity monitors may be suitable for some studies and several papers on these devices are included. The gold standard for physical activity measurement is by the doubly-labeled water method. This technique provides a reliable estimate of total energy expenditure, but cost and complexity limit its usefulness. Three references describing the doubly-labeled water method are included. Physical fitness measurement may be used as a marker for habitual physical activity, and several references are listed for this topic. Finally, several references on blood pressure response to exercise are given. The list of references annotated here is not exhaustive, but includes most of the principal techniques on activity assessment that are currently in use.
Taylor et al., 1978 Subjects review a checklist of household, recreational, and sports activities and indicate the activities in which they have participated over the past year. A trained interviewer reviews the checklist with the subject, and details regarding the frequency and duration of each activity are recorded. The interview takes approximately 20 minutes. Time spent in each activity is multiplied by an activity intensity code to estimate the energy expenditure for each activity. Energy expenditure is
summed across all activities to obtain total expenditure over the year. Energy expenditure subgroup scores also are calculated for light, moderate, and heavy intensity activities. Administration and scoring instructions are included in the article. This questionnaire has been used in several large studies including the Multiple Risk Factor Intervention Trial.
Folsom et al., 1986 The Minnesota Leisure-Time Physical Activity (MLTPA) questionnaire was developed from a year-long study on leisure time physical activity, ranging from household home repair to sports and conditioning activities. The MLTPA, used extensively in epidemiological studies, yields an estimate of energy expenditure, typically in kilocalories expended per week. Folsom and colleagues administered the MLTPA two times over a five-week interval to 140 adults from the general population and two times over a two-week interval to 150 men who were participating in the Multiple Risk Factor Intervention Trial. Energy expenditure estimates were slightly, but not significantly, lower at the second administration of the questionnaire. Rank order correlations were high (0.79 to 0.88) for total activity.
Blair et al., 1985; Sallis et al., 1985 The questionnaire described in these papers is an interviewer-administered recall of physical activity participation over the past seven days. Subjects are asked to estimate the number of hours spent sleeping and in moderate, hard, and very hard intensity activities. Light physical activity hours are obtained by subtraction. All physical activities are surveyed, including occupational, household, recreational, and sports. Total daily energy expenditure in kcal per kg of body weight is calculated. The questionnaire was validated in a randomized exercise training study, by comparison of energy expenditure and energy intake, and by showing an association between energy expenditure and physical fitness. Instructions for administration and scoring are included in the articles.
Paffenbarger et al., 1993 This paper presents the latest version of the questionnaire used for activity assessment in the Harvard Alumni Study, which has been in operation for more than 20 years. Evidence for validity of the method derives from the strong and consistent association of physical activity measured by this technique with several important health outcomes, including non-insulin-dependent diabetes mellitus, coronary heart disease, some site-specific cancers, and all-cause mortality.
Washburn et al., 1991 Physical activity data from the Harvard Alumni Activity Survey were significantly correlated with high density lipoprotein cholesterol and body mass index in a large general population sample
in Boston. Test-retest reliability coefficients for two administrations 7 to 12 weeks apart were 0.58 to 0.69. These associations and reliability coefficients are comparable to data reported for more complicated and labor intensive questionnaire procedures.
Bouchard et al., 1983 Energy expenditure estimates are derived from a three-day activity record. Subjects code their activity intensity (from one of nine intensity categories ranging from sleeping [1.0 MET] to intense manual work [> 7.8 METs]) every 15 minutes on a data form. A reliability study in 61 subjects who completed the record twice within 6 to 10 days yielded an intraclass coefficient of 0.96. Validity of the procedure is supported by associations with physical fitness and body fatness.
Godin and Shephard, 1985; Owen et al., 1988; Schechtman et al., 1991; Washburn et al., 1987, 1990; Weiss et al., 1990 These six papers present evidence that crude assessments of physical activity can be valid. The questions used in these research projects were simple, short, and easily completed. The various studies used from one to four questions. The questions could be completed within several seconds or at most a minute or two.
Blair et al., 1989a Subjects were 3,943 women and 15,627 men who performed a maximal exercise test on a treadmill and received a clinical examination. Physical fitness, as determined by the treadmill test, was associated with the sedentary traits. When smoking habit, a simple physical activity index, and the sedentary traits were included in a multiple regression model with physical fitness as the dependent variable, R2 values ranged from 0.20 to 0.53 in women and 0.45 to 0.61 in men across age groups. These data suggest that the addition of data on sedentary traits to a simple physical activity index can sharpen the prediction of physical fitness in epidemiologic studies.
Kohl et al., 1988 Self-reported physical activity habits from a mail survey were correlated with data from a maximal exercise test in 375 men. The exercise test was administered within 60 days of the receipt of the mail questionnaire. Validity of the mail questionnaire physical activity assessment was supported by a multiple correlation of 0.65 between physical fitness and age and physical activity questions.
Baecke et al., 1982 This questionnaire includes 16 questions that cover three components of physical activity: physical activity at work, sports during leisure time, and physical activity during leisure time excluding sports. The questionnaire was given to a group of 306 young (ages 20 to
32 years) Dutch men and women. Test-retest reliability for the three activity components ranged from 0.74 to 0.88.
Jacobs et al., 1989 CARDIA is a large biracial study of more than 5,000 young (ages 18 to 30 years) men and women. The CARDIA questionnaire includes items about physical activity participation over the past three months and over the past year. The questionnaire is interviewer administered, and can be administered by a telephone interview. Validity of the method is supported by comparison of body composition, energy intake, physical fitness, and blood lipids across physical activity groups. Test-retest reliability is reported to range from 0.77 to 0.84.
Williams et al., 1989 This study compared the reliability and convergent validity of the seven-day physical activity recall, the Caltrac monitor, and a daily physical activity log for 45 subjects over a three-week interval. Reliability was high for the seven-day recall and the activity log, but not for the Caltrac. Convergent validity was high for the seven-day recall and the daily log, but was low for both of these measures when compared with the Caltrac.
Physical Activity Monitors
LaPorte et al., 1979 This study evaluated the Large-Scale Integrated Motor Activity Monitor. The unit is slightly larger than a wristwatch, and it uses a mercury switch to detect bodily movement. The instrument was validated by comparing movement counts from the instrument in sedentary and active groups. The subjects also maintained an activity log while wearing the movement sensor. There was a significant correlation (r = 0.65) between activity log reports and movement counts from the instrument.
Kalkwarf et al., 1989 Twelve young women wore heart rate monitors and completed 24-hour activity diaries for a four-week period. Resting metabolic rate and the energy cost of various activities were assessed by indirect calorimetry with simultaneous recording of heart rates. These data were used in conjunction with the activity diaries and heart recordings to estimate energy expenditure. The standard of comparison for energy expenditure was energy intake calculated from weighed food intake and changes in body energy stores. Heart rate monitors overestimated group energy expenditure by 2 to 9 percent. Activity diaries underestimated energy expenditure by 2 to 6 percent. The authors conclude that heart rate monitoring and activity diaries may be accurate enough for group estimates of energy expenditure, but not for individual estimates.
Avons et al., 1988 Subjects wore three actometers (modified self-winding watches, one each on an arm, leg, and waist) and had heart rates recorded during a stay in a whole-body calorimeter. Additional data from the actometers were collected during seven days in the free-living condition. The actometers provided a satisfactory estimate of energy expenditure.
Taylor et al., 1982, 1984 These two studies used the Vitalog monitor, which is a microprocessor device that records heart rate and bodily movement. The instrument has been used in several exercise trials, and its reproducibility has been established both within weeks and between weeks. Validity has been shown by monitoring changes in exercise groups in controlled studies. Objective data from both heart rate and bodily movement presumably provide more precise estimates of energy expenditure than for either used alone.
Montoye et al., 1983; Wong et al., 1981 These two papers describe a portable accelerometer that is worn on the waist to measure bodily movement. The instrument has good reproducibility (r = 0.94). The standard error of estimate for predicting oxygen uptake is reported to be 6.6 ml min-1 kg-1.
Baranoski et al., 1984 This study evaluated six different forms of self-report of aerobic activity in third- to sixth-grade children. Self-report data were compared to behavioral observations. Greatest agreement with the criterion measure was obtained when the self-report method that segmented the day into functional components was used. The average percentage of agreement across all forms of self-report was 73.4.
Klesges and Klesges, 1987 The investigators evaluated physical activity in 30 young children (ages 24 to 48 months) with a portable accelerometer and by direct observation by trained observers. Correlations between hourly accelerometer readings and observation ranged from 0.62 to 0.95. The best predictor of all-day accelerometer readings was with the observed behavior of walking, which accounted for 32 percent of the variance in the accelerometer data.
O'Hara et al., 1989 Heart rates were compared with behavioral observations in 36 children (ages 8–10) during physical education class. Heart rate was obtained from a wristwatch-size monitoring unit that recorded heart rate every 15 seconds from the ECG. Trained observers recorded
physical activity in four categories of movement each minute. The average correlation for minute-by-minute heart rates and observations was 0.64. Interobserver agreement was high (r = 0.96).
Sallis et al., 1990 Body movement (by the Caltrac) and heart rates (by a portable monitor) were recorded in elementary-school-age children. Correlations between the instruments were 0.54 for day one and 0.42 for day two. Inter-instrument reliability in the field setting was 0.96. Data from the accelerometers and heart rate monitors were significantly correlated with physical activity recalls of the same day.
Several of the questionnaires described above have been used in populations across the age range. In this section, two papers are listed that were specifically used with older groups.
Cartmel and Moon, 1992 Two questionnaires (Minnesota Leisure-Time Physical Activity [MLTPA] and seven-day recall) were administered to 24 older men and women (ages 59 to 83 years). Each person also kept a physical activity diary over four consecutive days. Data from the diaries were compared with each questionnaire. The MLTPA was more accurate than the seven-day recall for moderate/heavy activities, and the seven-day recall was more accurate than the MLTPA for time spent sitting.
Cauley et al., 1987 The investigators used five different methods for measuring physical activity in a population of 255 white, postmenopausal women. Both questionnaires and an activity monitor were used. The activity measures were only weakly correlated. The authors recommend that several different types of activity assessment be used to obtain adequate data on different types of physical activity patterns.
Caspersen, 1989; Lamb and Brodie, 1990; Saris, 1986; Tremblay and Bouchard, 1987; Wilson et al., 1986 These five review papers present information on most of the commonly used methods for the assessment of physical activity and physical fitness in population based studies. Methods applicable across the age range are presented.
Klein et al., 1984; Livingstone et al., 1992; Stein et al., 1987 These three
papers describe the doubly-labeled water method of estimating energy expenditure in humans. Validation of the technique is by comparison with directly measured energy expenditure in a whole body calorimeter. The doubly-labeled water method is considered by many to be the ''gold standard" for energy expenditure. Unfortunately, the method is quite complex and very expensive, and it is not suited for epidemiological studies. The doubly-labeled water technique has great promise for validating physical activity questionnaires.
Montoye et al., 1970; Siconolfi et al., 1982; Sidney et al., 1992 These three reports describe and discuss various methods of assessing aerobic power (physical fitness) in population-based studies. Maximal and submaximal testing protocols are described. Both cycle ergometry and treadmill testing are included.
Blood Pressure Response to Exercise
Jones et al., 1985 One hundred healthy subjects representing an even distribution of sex, age, and height underwent a progressively incremental exercise test to obtain standards for the various physiological measures. Systolic blood pressure was measured by cuff methods. Age and sex specific normal values are provided.
Ekelund et al., 1990 Approximately 4,300 representative healthy black and white men and women aged 20–69 years performed a standardized treadmill exercise test (to 85 percent of age-specific predicted maximal heart rate). Systolic blood pressure was measured at rest and at each stage of exercise using a mercury manometer and stethoscope. Data on age, sex, and race specific values are presented, demonstrating that black men have higher systolic blood pressures during exercise that white men even after adjustments for resting values.
Morris et al., 1978 The incidence of decreases in peak systolic blood pressure during treadmill exercise was investigated in 460 patients with definite or suspected coronary heart disease. Systolic blood pressure was measured at rest, at each stage of exercise, and at peak exercise. Also, all patients were studied with coronary cineangiography. A sustained exercise-induced decrease in peak systolic blood pressure of 10 mm Hg or more is a highly specific sign of multiple vessel coronary artery disease.
Franz, 1982 Systolic and diastolic blood pressures were measured using
1st and 4th phase Korotkoff sounds at rest and during each minute of cycle ergometry in healthy normotensive men (n = 173) and women (n = 130) aged 20 to 50 years. A similar protocol was performed by smaller numbers of borderline and frank hypertensive patients. The mean and standard deviation for blood pressure responses during an exercise load ranging from 100 to 600 watts are presented by sex, age, and resting blood pressure status.
Wolthuis et al., 1977 Systolic and diastolic blood pressures were measured using the Korotkoff method in 704 healthy asymptomatic aircrewmen at rest and at each stage of a maximal treadmill exercise test using a Balk-c-Ware protocol. The mean, 10th, and 90th percentiles for systolic and diastolic blood pressures are presented for several submaximal workloads, maximal exercise, and at 2 and 5 minutes of recovery. These data provide good "normal" values for blood pressure responses to treadmill exercise in healthy younger men.