Click for next page ( 121


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 120
APPENDIX C Method of Estimating Confidence Intervals Two specific elements are necessary for the probabil- ity approach: a distribution of required intake for a population and a distribution of actual intake. It is also assumed that required intake and actual intake are independent. DESCRIPTION OF METHOD The random variable X describes required intake with a distribution function: G(x) = P(X < x). Let the random variable Y describe the actual intake with a distribution function: F(x) = P(Y < x). The proportion of persons with inadequate intake can be expressed as the proportion of people whose actual intake is below the required intake: P(Y < X). This can be written as -I P[X > ylY = y~dF(y), where P(AIB) _00 corresponds to the conditional probability of event A, given event B. Under the assumption that X and Y are independent, P[X > ylY = y] = P[X > y] = 1 - G(y), 120

OCR for page 120
121 and hence P[Y < X] = I[1 - G(y)]dFty) = 1 - IG(Y)dF(Y)- The estimate for the distribution of actual intakes, F(x) (described below), is based on survey data of indi- vidual daily intakes. Because daily intakes for any given person vary from day to day, this implies that the same person might on some days be above and on other days be below his or her required intake simply because of day-to-day variability. Presumably then, usual intake, an idealized average intake of persons over a long period, is of interest. m erefore, F(x) should represent the distribution of these idealized averages for a population. To compute F(x) in this manner, one must separate the in;;raindi- vidual variability from the idealized interindividual variability. ESTIMATING TEE DISTRIBUTION OF ACTUAL INTAKES F(x) The Parametric Method The data are first transformed to approximate normality. For this purpose, the log transformation seems to work well for most nutrients, but not for all. A random components model is then fitted to the trans- formed data. The term Yij denotes the observation of intake for the jth replication of the ith individual, where i = I, ..., I, and j = 1, ..., J. m e trans- formation function is denoted by g(.), and the trans- formed data are denoted by Zij = g(Yi;)- The random components model is given by Zij = ~ + Hi + eii, where Al is assumed to be identically and inde- pendently distributed (lid) N(O,oA) and the eij are assumed to be independent [N(O,oe)~. The variance He refers to the intraindividual variability, and oi refers to the interindividual variability. If the intraindividual variability were eliminated, then the distribution of Z = g(Y) would be distributed as

OCR for page 120
122 a normal distribution with mean ~ and variance al, both of which can be estimated efficiently by the sample mean and the results of a 1-way ANOUA as described in Chapter 4. In this approach , the distribution of idealized actual intakes is given by F(x) = P(Y < x). Bence, g(x) 2 -1/2 P[g(Y) < g(x)] ~ 1- (IDEA) exp{-1/2[g(y) - p] /cA}dy. Therefore, day) = (PICA) exp{-1/2[g(x) _ p~2/~2}g,(X)dx, where g'(x) ~ dg(x)/dx. The proportion of persons with inadequate intakes is given by 1 - R(p,~) ~ IG(x)(2~2~~1/2exp-l/2{[g(x) - pl2a~}g'(x)dx. Hence, we can estimate R(p,~) by R(p,ot), where I J ^ i-1 j-1 ij/]J' and MA = J (MSA - MS ), 1 where MSA = J;(Zi. - z..)2/(I _ l) and MSe = ~ S(Zi; ~ Zi.)2/I(J ~ 1). In a balanced 1-way random components model, the estimates for the second moments of ~ and oi are given by: Vary) ~ (MSA)/IJ, Var(cA) = 2J {(I - 1)1 MS2 + [I(J - 1)11 MS2}, and COv ( Or 01) = 0. Because K(p,ai) is a smooth function of and al, and for large samples the estimates ~ and 6; are asymp- totically normal, the 95% confidence interval can be approximated using the delta method (Bicker and Doksum, 1977). That is, the 95% confidence interval is approxi- mately given by: (W A) - K,

OCR for page 120
123 2 a o2 2 am 2' 2 where SK = [ an' A)] Vary) + [ aaA ] (Var)(aA). The partial derivatives would be calculated most easily using numerical methods. That is, ax R(p ~ h, by) - K(p - h,~l ) = r - 2h given that h is sufficiently small. Similarly, ~ ~ ^ ^^ ax K(p,~2 + h) - K(~cA ~ h) _o acA 2h The parametric approach can be used as long as the dis- tribution of the transformed data is approximately normal. If not, a larger class of transformations such as the Box- Cox (power) transformation, should be considered (Box and Cox, 1964). The data can be plotted on normal probability graph paper, and formal goodness-of-fit tests can be performed to determine if the transformed data are sufficiently close to normality to make the method valid (Hoaglin and Mosteller, 1982). The Nonparametric Approach To implement the probability approach, we must have an estimate of the distribution of actual intake in a specified population, F*(x). Since nutritional intake of individuals varies from day to day, it is assumed that a person's intake corresponds to an idealized average intake over a long period. Let Yij denote the amount of nutrient ingested by individual i on day j. We assume that some transformation of Yij, say Zij' follows a random components model. That is, Zij = di + ei; and Zij = g(Yi~), where the do are lid with distribution func- tion ~ , and Zij are assumed to be lid with distribution function G and are independent of the di. The distribu- tion function G is assumed to have a mean equal to zero and a variance of ce (intraindividual variability), whereas

OCR for page 120
124 the distribution F* (distribution of idealized average of tr2ansformed intakes) has a mean of ~ and a variance of aA (interindividual variation). J The sample averages Zi = Zii/J i i t~ibution function 8, a mean of a, and a variance of cA + ~ JO = a20bS, where c2obs is the estimated SD for the observed sample averages. In the above parametric approach, the underlying distributions F* and G were assumed to follow a normal density. The subcommittee believes that without making any specific parametric assumptions about the underlying distribution of F*, a reasonable estimate of F* can be obtained by assuming that the shape of the distribution function of F* should be similar to that of H. This motivated the heuristic estimate of F*, which takes the shape of the empirical cdf, B(x) = iE1 I(Zi < x)/n, where I = (Z. < x) = {O if i. - } Z. > x 1. and shrinks it toward the mean. The scaled estimate F*(x) = Hip + (x - ~i)6ObS/&A] has the following properties: The mean is equal to p. The variance is equal to oi (estimate of interindi- vidual variability). The shape of F* resembles that of B. This estimate was chosen on a heuristic basis and should be a reasonable approximation to F*. Strictly speaking, it is really not a nonparametric estimate in that t* will be a consistent estimate of F* for only restricted cases (i.e., if F* were normal); however, we believe it will serve as a reasonable approximation for skewed distributions as well. Therefore, an estimate for F(x) - P[Y < x] can be taken as = P[g(Y) < g(x)] = H g{p + (x - p) aObs/cA} and the estimate for the proportion of individuals with inadequate intake would be:

OCR for page 120
125 ~ ~ ^ ~ IG(x)dH[g{p + (x - p) obs }]. HA The distributional properties for the nonparametric method are more complicated than the parametric approach because no implicit assumption of normality can be made. For this reason, a bootstrap distribution is used to calculate the confidence interval. This is performed as follows: ~ 1. I individuals are sampled at random with replacement from the I group in the original data set to create a simulated data set: (~1' j' i = 1, ~ J)' (ZR2' j, i = 1, ..., J), .... (ZRI, i, i = 1, ..., J), where R1, ..., RI are random indices from 1 to I chosen with equal probability. 2. With this simulated data set, an empirical cdf HB(x) is computed and an ANOVA is performed on the simulated data to compute ~ ~ ~ b ~ and oB. 3. Then an estimate for the proportion with inadequate intake is computed in the following manner: IG(x) ~ [9{p + (x - ~ ) obs }] ^B aA 4. Steps 1 through 3 are repeated with random sets of simulated data to generate a distribution of the prev- alence of inadequate intake. The confidence interval can now be obtained by picking the appropriate percentiles from this distribution. Assumptions of the 95% Confidence Interval - The methods of computing 95% confidence intervals assume that the measurements taken from each person are independent of each other and that there are no sys- tematic biases. These assumptions are subject to some criticism because measurement of nutrients is based not only on the amount of foods eaten, as given in the

OCR for page 120
126 dietary survey, but also on food composition tables. The tables are themselves subject to variation, which is not taken into account in the estimate of the 95% confidence interval. me magnitude of this problem should be inves- tigated through sensitivity analyses. REFERENCES Bickel, P. J. , and K. Ae Dokaum. 1977. Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, San Francisco. Box , G . E. P., and D. R. Cox . 1964. An analysis of transformations. J. R. Stat. Soc. B 26:211-252. Moaglin, D. C., and F. Mosteller, eds. 1982. Under- standing Robust and Exploratory Data Analysis. John Wiley & Sons, New York.