Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 120
APPENDIX C
Method of Estimating Confidence
Intervals
Two specific elements are necessary for the probabil-
ity approach: a distribution of required intake for a
population and a distribution of actual intake. It is
also assumed that required intake and actual intake are
independent.
DESCRIPTION OF METHOD
The random variable X describes required intake with a
distribution function:
G(x) = P(X < x).
Let the random variable Y describe the actual intake with
a distribution function:
F(x) = P(Y < x).
The proportion of persons with inadequate intake can
be expressed as the proportion of people whose actual
intake is below the required intake:
P(Y < X).
This can be written as -I P[X > ylY = y~dF(y), where P(AIB)
_00
corresponds to the conditional probability of event A, given
event B.
Under the assumption that X and Y are independent,
P[X > ylY = y] = P[X > y] = 1 - G(y),
120
OCR for page 120
121
and hence
P[Y < X] = I[1 - G(y)]dFty) = 1 - IG(Y)dF(Y)-
The estimate for the distribution of actual intakes,
F(x) (described below), is based on survey data of indi-
vidual daily intakes. Because daily intakes for any
given person vary from day to day, this implies that the
same person might on some days be above and on other days
be below his or her required intake simply because of
day-to-day variability. Presumably then, usual intake,
an idealized average intake of persons over a long period,
is of interest.
m erefore, F(x) should represent the distribution of
these idealized averages for a population. To compute
F(x) in this manner, one must separate the in;;raindi-
vidual variability from the idealized interindividual
variability.
ESTIMATING TEE DISTRIBUTION OF ACTUAL INTAKES F(x)
The Parametric Method
The data are first transformed to approximate
normality. For this purpose, the log transformation
seems to work well for most nutrients, but not for all.
A random components model is then fitted to the trans-
formed data. The term Yij denotes the observation of
intake for the jth replication of the ith individual,
where i = I, ..., I, and j = 1, ..., J. m e trans-
formation function is denoted by g(.), and the trans-
formed data are denoted by Zij = g(Yi;)- The random
components model is given by
Zij = ~ + Hi + eii,
where Al is assumed to be identically and inde-
pendently distributed (lid) N(O,oA) and the eij are
assumed to be independent [N(O,oe)~. The variance He
refers to the intraindividual variability, and oi refers
to the interindividual variability.
If the intraindividual variability were eliminated,
then the distribution of Z = g(Y) would be distributed as
OCR for page 120
122
a normal distribution with mean ~ and variance al, both
of which can be estimated efficiently by the sample mean
and the results of a 1-way ANOUA as described in Chapter 4.
In this approach , the distribution of idealized actual
intakes is given by F(x) = P(Y < x). Bence,
g(x) 2 -1/2
P[g(Y) < g(x)] ~ 1- (IDEA) exp{-1/2[g(y) - p] /cA}dy.
Therefore,
day) = (PICA) exp{-1/2[g(x) _ p~2/~2}g,(X)dx,
where g'(x) ~ dg(x)/dx.
The proportion of persons with inadequate intakes is
given by 1 - R(p,~) ~ IG(x)(2~2~~1/2exp-l/2{[g(x) -
pl2a~}g'(x)dx. Hence, we can estimate R(p,~) by
R(p,ot), where
I J ^
i-1 j-1 ij/]J' and MA = J (MSA - MS ),
—1
where MSA = J;(Zi. - z..)2/(I _ l) and MSe = ~ S(Zi;
~ Zi.)2/I(J ~ 1).
In a balanced 1-way random components model, the
estimates for the second moments of ~ and oi are
given by:
Vary) ~ (MSA)/IJ,
Var(cA) = 2J {(I - 1)1 MS2 + [I(J - 1)11 MS2}, and
COv ( Or 01) = 0.
Because K(p,ai) is a smooth function of and al,
and for large samples the estimates ~ and 6; are asymp-
totically normal, the 95% confidence interval can be
approximated using the delta method (Bicker and Doksum,
1977). That is, the 95% confidence interval is approxi-
mately given by:
(W A) - K,
OCR for page 120
123
2 a o2 2 am 2' 2
where SK = [ an' A)] Vary) + [ aaA ] (Var)(aA).
The partial derivatives would be calculated most easily
using numerical methods. That is,
ax R(p ~ h, by) - K(p - h,~l )
= r
- 2h
given that h is sufficiently small. Similarly,
~ ~ ^ ^^
ax K(p,~2 + h) - K(~cA ~ h)
_o
acA 2h
The parametric approach can be used as long as the dis-
tribution of the transformed data is approximately normal.
If not, a larger class of transformations such as the Box-
Cox (power) transformation, should be considered (Box and
Cox, 1964).
The data can be plotted on normal probability graph
paper, and formal goodness-of-fit tests can be performed
to determine if the transformed data are sufficiently
close to normality to make the method valid (Hoaglin and
Mosteller, 1982).
The Nonparametric Approach
To implement the probability approach, we must have an
estimate of the distribution of actual intake in a
specified population, F*(x). Since nutritional intake of
individuals varies from day to day, it is assumed that a
person's intake corresponds to an idealized average
intake over a long period. Let Yij denote the amount
of nutrient ingested by individual i on day j. We assume
that some transformation of Yij, say Zij' follows a
random components model. That is, Zij = di + ei; and Zij
= g(Yi~), where the do are lid with distribution func-
tion ~ , and Zij are assumed to be lid with distribution
function G and are independent of the di. The distribu-
tion function G is assumed to have a mean equal to zero and
a variance of ce (intraindividual variability), whereas
OCR for page 120
124
the distribution F* (distribution of idealized average of
tr2ansformed intakes) has a mean of ~ and a variance of
aA (interindividual variation).
J
The sample averages Zi = £ Zii/J i i
t~ibution function 8, a mean of a, and a variance of
cA + ~ JO = a20bS, where c2obs is the estimated SD for
the observed sample averages. In the above parametric
approach, the underlying distributions F* and G were
assumed to follow a normal density. The subcommittee
believes that without making any specific parametric
assumptions about the underlying distribution of F*, a
reasonable estimate of F* can be obtained by assuming
that the shape of the distribution function of F* should
be similar to that of H. This motivated the heuristic
estimate of F*, which takes the shape of the empirical
cdf,
B(x) = iE1 I(Zi < x)/n,
where I = (Z. < x) = {O if i. - }
Z. > x
1.
and shrinks it toward the mean. The scaled estimate
F*(x) = Hip + (x - ~i)6ObS/&A] has the following properties:
· The mean is equal to p.
· The variance is equal to oi (estimate of interindi-
vidual variability).
· The shape of F* resembles that of B.
This estimate was chosen on a heuristic basis and
should be a reasonable approximation to F*. Strictly
speaking, it is really not a nonparametric estimate in
that t* will be a consistent estimate of F* for only
restricted cases (i.e., if F* were normal); however, we
believe it will serve as a reasonable approximation for
skewed distributions as well.
Therefore, an estimate for F(x) - P[Y < x] can be
taken as = P[g(Y) < g(x)] = H g{p + (x - p) aObs/cA}
and the estimate for the proportion of individuals with
inadequate intake would be:
OCR for page 120
125
~ ~ ^ ~
IG(x)dH[g{p + (x - p) obs }].
HA
The distributional properties for the nonparametric
method are more complicated than the parametric approach
because no implicit assumption of normality can be made.
For this reason, a bootstrap distribution is used to
calculate the confidence interval. This is performed
as follows: ~
1. I individuals are sampled at random with
replacement from the I group in the original data set to
create a simulated data set:
(~1' j' i = 1, ~ J)' (ZR2' j, i = 1, ..., J), ....
(ZRI, i, i = 1, ..., J),
where R1, ..., RI are random indices from 1 to I chosen
with equal probability.
2. With this simulated data set, an empirical cdf
HB(x) is computed and an ANOVA is performed on the
simulated data to compute ~ ~ ~ b ~ and oB.
3. Then an estimate for the proportion with
inadequate intake is computed in the following manner:
IG(x) ~ [9{p + (x - ~ ) obs }]
^B
aA
4. Steps 1 through 3 are repeated with random sets of
simulated data to generate a distribution of the prev-
alence of inadequate intake. The confidence interval can
now be obtained by picking the appropriate percentiles
from this distribution.
Assumptions of the 95% Confidence Interval
-
The methods of computing 95% confidence intervals
assume that the measurements taken from each person are
independent of each other and that there are no sys-
tematic biases. These assumptions are subject to some
criticism because measurement of nutrients is based not
only on the amount of foods eaten, as given in the
OCR for page 120
126
dietary survey, but also on food composition tables. The
tables are themselves subject to variation, which is not
taken into account in the estimate of the 95% confidence
interval. me magnitude of this problem should be inves-
tigated through sensitivity analyses.
REFERENCES
Bickel, P. J. , and K. Ae Dokaum. 1977. Mathematical
Statistics: Basic Ideas and Selected Topics.
Holden-Day, San Francisco.
Box , G . E. P., and D. R. Cox . 1964. An analysis of
transformations. J. R. Stat. Soc. B 26:211-252.
Moaglin, D. C., and F. Mosteller, eds. 1982. Under-
standing Robust and Exploratory Data Analysis. John
Wiley & Sons, New York.