Skip to main content

Currently Skimming:

7. Some Methodological Issues in Making Predictions
Pages 291-313

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 291...
... At its most general level, a prediction study investigates the extent to which criterion measures (the clependent variables) can be preclicted by one or more measures of other factors (the predictor or independent variables)
From page 292...
... in that, instead of contributing a score of O or 1, each category of each predictor is weighted according to the percentage of subjects in that category who are successes. The Glueck method can be appliecl to polychotomous inclependent variables, but in practice it has only been used for binary predictors.
From page 293...
... We would advocate the use of the formal inclepenclence Bayes method in preference to the more act hoc Burgess and Glueck approaches because it has several important advantages: 1. It is equally simple yet is based on a coherent theory and is optimum within the framework of that theory.
From page 294...
... If the xi's are correlated, but not all to the same degree, the socalled "Lancaster models" can be used, CRIMINAL CAREERS AND CAREER CRIMINALS which are based on a second-order approximation to the joint distribution of the xi's. These models have been found usefuT in medical diagnosis applications; see review in Titterington et al.
From page 295...
... We are concerned here with the underlying methodology of the assessment of prediction equations, rather than with details of prediction equations in specific applications. There are two contrasting, and yet complementary, approaches to the discussion ofthis question, corresponding roughly to the two philosophies of statistical inference and decision theory as understood in the statistical literature.
From page 296...
... , calculated using the original prediction equation f but using the new values of x. The difference between the sets of data is emphasized by the terms "construction data" and "validation data." Shrinkage implies that validation fit is worse than retrospective fit.
From page 297...
... This, therefore, represents the ideal situation as far as fitting and validating a prediction equation is concerned. If ~ and ,B are least squares estimates in the construction data, the prediction · .
From page 298...
... , in that y corresponds exactly to a "shrinkage estimator" in the sense of the term used in the statistical literature. It is proved that, within the assumptions outlined above, y is uniformly better than y in the mean squared error sense, Be., E(y _ y)
From page 299...
... In practice, prediction equations are often simplified by using stepwise regression or some other proceclure for subset selection; the variables in x are then selected using the data, and only those x's showing reasonably strong correlation with y are retained. The usual theory of least squares is, of course, completely upset by such selection.
From page 300...
... In- 1 - m) R CRIMINAL CAREERS AND CAREER CRIMINALS with K = 0.602 and much greater than that implied by the value K = 0.931.
From page 301...
... in the validation sample, over the distribution of regression parameters, as well as over sampling variation in the m's and V's. The only requirement is, as before, that m 3 3.
From page 302...
... have formulated a precise specification of RR and, further, have developecl a significance test by which the hypothesis ,l3 ~ RR can CRIMINAL CAREERS AND CAREER CRIMINALS be assessed in the light of the estimated regression coefficient vector. Thus, when constructing a prediction equation, my and Vat are taken directly from the construction sample; the likely superiority of the shrinkage correction can then be checked using the robustness region test against a variety of changes in population that might be contemplated.
From page 303...
... First, a simulation study can be unclertaken in which the prediction equation is fitted to a random subset of the data, and the remaining cases are screened in the appropriate way to form the valiclation sample. The random sampling of the construction data is repeated a large number of times to obtain expected values of prediction mean squarest error to other measures of predictive performance.
From page 304...
... For example, in the absconding study mentioned above, there is little basis for choosing on statistical grounds between the fits with the total of 22 x's and with a subset of just 4 x's (Figures 1 and 2~. Third, caution is needed if a prediction equation is to be applied outside the range of the construction data.
From page 305...
... study of criminal careers. The example adopted here to illus~ate and develop the discussion of predictive power is taken from Copas and Whiteley (1976)
From page 306...
... On the CRIMINAL CAREERS AND CAREER CRIMINALS other hanct, the exact value of the variance of By is available (Goodman and Kruskal, 1963) , which permits a more powerful test.
From page 307...
... . ~ FN TN 21 33 N=41 Nf=46 s Base rate= .471 Actual Outcome Success Fail ure , , TP FP 34 28 FN TN 7 18 N = 41 s Base rate= .471 Nf= 46 NP = 12 s NPf= 75 NP = 33 NPf= 54 NP = 62 s NPf= 25 FIGURE 3 Correct predictions and errors for each cutoff point.
From page 308...
... is not .5, CRIMINAL CAREERS AND CAREER CRIMINALS as is often the case in practice, total errors are minimize<] when the selection ratio is set to equal the base rate.
From page 309...
... The standard deviation is given by: S.D.= - ~/2 where n is the numerator and N the clenominator of the rate. In this example the 95 percent confidence limits for the total error rate are .289 and .493.
From page 310...
... . Although unbiased estimates of shrinkage and error rates result from this method, there are two obvious disac3van CRIMINAL CAREERS AND CAREER CRIMINALS The first, simple extension of the principle is cross-vaTidation, of which the split-half method is merely a special case.
From page 311...
... bootstrap samples were drawn and the same procedure was used to construct a prediction instrument. On each occasion the "overoptimism random variable," R', was calculated, which is merely "the error rate for the bootstrap replication minus .158." The 500 values of R' were plotted and the mean of R' was found to be .045, which suggests that the expected overoptimism is about one-third as large as the apparent error rate .158.
From page 312...
... An analogous situation occurs in medical science, where mass-screening programs are costly and may result in large false-positive errors, causing considerable stress, but where they are nevertheless considered to be worthwhile to detect the small number of true positives who actually have the rare disease. Therefore, the worth of any prediction instrument depends on the values to be attached to the various outcomes emanating from its application, not simply on the total number of errors that may accrue.
From page 313...
... In On the Robustness of Shrinkage Predictors press in Regression: Some Theoretical Considerations. Journal of the Royal Statistical Society, Series B 48.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.