Murray M. Pollack, M.D., M.B.A.*
Decisions about patient care are often based on the prognostic likelihood for outcomes such as death, disability, or toxicity for each potential therapy. For some decisions, the experience and knowledge base of individual clinicians allows good estimates of the outcomes for each therapeutic option. For other decisions involving potential therapeutic options, however, the experience and knowledge base of the individual clinician may be very limited. Decisions involving the likelihood of major disabilities or death, especially for infants and children, are often those for which both the experience and knowledge base of the individual clinician is inadequate. Even more important, the scientific literature is often not helpful in assigning a prognosis to individual patients, especially when the outcome is death or disability in the short-term future.
For these reasons, prognostication, the prediction of disease progression and outcome with and without therapies, remains a critical aspect of clinical medicine. The interest in prognostication tools as potential decision making aids stems from the need to improve prognostic estimates under the conditions of limited clinician experience and limited scientific evidence.
Contemporary developments mainly driven by concerns for adult pa-
tients are increasing our interest in predicting death and its timing. Patients and families have become more involved in making medical decisions and often need better information about their medical situation and the likely consequences of different treatment options. Meaningful involvement in medical decision making for patients and families implies access to meaningful information about expected outcomes. Meaningful involvement is especially important for those with complex chronic medical problems. Another reason for the interest in prognosis is that laws or regulations make access to some kinds of care contingent on estimates of life expectancy. For example, to qualify for hospice benefits, a Medicare or Medicaid beneficiary must be certified as having less than six months to live. Similarly, Oregon’s assisted suicide statute links legal access to prescriptions of lethal drugs to a six-month life expectancy (among other criteria).1 Statements in “living wills” or advance care directives about preferences for CPR or other life-sustaining interventions may be phrased in terms of expectations about survival or other outcomes (e.g., cognitive capacity). In addition, the development and wide deployment of advanced medical technologies in recent years has prompted concern about possibly futile uses of these technologies for those who die.2,3
Pediatric decision making involves special circumstances. The child patient has limited legal autonomy, and capacity to make reasoned decisions that varies with a child’s developmental level. Nonetheless, many pediatricians believe that children’s views ought to be taken into account, which can present problems, especially in the case of older adolescents. Family and societal values are complex and sometimes incline toward exhausting every life-prolonging option, perhaps without full appreciation of the burdens imposed on the child. Such a decision to continue life-prolonging treatment far beyond the recommendations of health care professionals is also a severe stress to the health care institution and staff when it occurs. Children cannot be left out of the current efforts to insure that people die with as much dignity as possible and that decisions about care be based on the appropriate scientific evidence and clinical experience.
Could prognostication tools and quantitative estimates or scores help clinicians? The following discussion considers the preparedness of physicians to undertake complex life and death decisions, the elements of quantitative prognostic tools and scores, the accuracy of clinicians’ estimates, and issues in using prognostic scores to guide decisions about individual patients.
THE LACK OF PREPAREDNESS
Despite the interest in prognostication and its application to end-of-life decisions, educational experiences and resources for health care profession-
als still provide little preparation in prognostic concepts and methods. Carron et al. surveyed the treatment of 12 diseases in four major textbooks. They found, in general, that helpful prognostic information was rare and that the natural disease progression was described so generally as to lack practical utility.4 Usually, the course of a disease was only linked to therapeutic options, not to futility. Texts infrequently mentioned death, the dying process or the effects of end-of-life changes on patients and families.
Compared to internists, the experience and training of pediatric clinicians in prognostication sciences and end-of-life rituals is likely to be even more deficient because fewer children than adults die or have life threatening conditions. The effects on families of a child’s life-threatening medical problem or a child’s death are, however, likely to be even more complex and difficult than the effects of an adult’s death.
Similarly, physicians’ training in decision-making sciences is limited while the difficulty in making the most informed decision may even increase as information increases. The amount of data obtained from electronic, laboratory and clinical methods has grown exponentially during the past few decades. As the amount of data grows, clinicians will have an increasing difficulty in integrating it to arrive at reliable prognostic likelihood estimates.5 Humans’ short-term memory can use from only four to seven data points or data constructs at one time.6,7 In data intensive environments such as ICUs, the cognitive limitations of clinicians may be especially restrictive.8 Attempts to use larger amounts of information simultaneously may lead to ineffective decision making, unjustified variability in clinical practice, and even clinical errors.
There is little information regarding the processes of how physicians learn and practice prognostication, how their experiences affect their practice, or even how frequently they encounter prognostication issues in their practice. In one provocative study, Christakis et al. found that while internists frequently encounter these issues, about 60% find it stressful to assess prognosis explicitly, about 45% wait to discuss prognosis until the patient brings it up, 90% avoid being specific, and almost 60% report inadequate training.9 Similar data are not available for pediatricians, but it is likely that general pediatricians would demonstrate even less familiarity and comfort with end of life prognostication since relatively fewer children die, those that do die are generally cared for by specialists, and childhood death may be difficult for many pediatricians to face.
What little information we have about pediatric end-of-life decision making suggests that there is substantial variability in how pediatricians estimate prognosis in similar situations. Randolph et al., using a cross-sectional design and several clinical scenarios, surveyed pediatric oncologists and pediatric intensivists. They found a substantial disparity in the prognostic estimates, and of quality of life values as a factor in decisions, and
recommended clinical actions between and within these groups.10 When given the same case scenarios, some physicians chose a full aggressive level of care while others chose only comfort measures. The most significant respondent factors affecting choices were level of training and experience and a self-rated importance of functional neurological status.
ELEMENTS OF PROGNOSTICATION
Prognostication involves the identification of relevant outcomes and clinical data elements that are associated with these outcomes. Statistical methods associate these outcomes and data elements. Choosing the right outcomes and data elements as well as the correct statistical processes is crucial to developing a reliable score suitable for clinical use. Clinicians need to understand the strengths and limitation of these processes to apply these tools.
Selection of the relevant outcome(s) is central to prognostication methods. Each medical condition has its own relevant outcomes. For example, hearing loss and pain are two serious outcomes for otitis media. For the purposes of this section, survival/death, serious disabilities, and costs will be considered as the most relevant to issues of death and dying. Survival and death are the main outcomes used in prognostication scores because they occur with sufficient frequency, the outcome states are well defined, and they are clearly important. Some have worried that the focus on death in prognostication may cause clinicians to preserve life at the cost of shifting outcomes from death to severe disability. After decades of worrying over this prospect, critics have produced no examples to substantiate a problem. However, other outcomes such as pain and suffering remain serious and important outcomes. Unfortunately, these outcomes cannot be quantified with sufficient accuracy and reproducibility, making objective prognostication very difficult.
Serious physical disability and/or cognitive impairment are also very important in prognostication and in many situations may be more important to patients and families than death. Unfortunately, the use of physical disability and/or cognitive impairment as outcomes in pediatrics has been severely hindered by the lack of summary measures describing disability states in a manner that can be used as outcomes in formal prognostication methods. While extensive neuropsychological testing can define functional states, it is time consuming, expensive, and specialized, limiting its use in developing prognostication scores that may require thousands of patients. One important effort to measure and quantify disability has been the Pedi-
atric Overall Performance Category (POPC) and Pediatric Cerebral Performance Category (PCPC) Scores.11 These scores are general assessments analogous to the Glasgow Outcome Scale Scores with operational definitions adapted for children. The PCPC and POPC scales quantify overall morbidity and cognitive impairment, respectively. Each is a six-point graded scale of increasing disability from normal function (score = 1) to death (score = 6). They are relatively simpler and quick to use in clinical settings. Qualitative assessments are made by care providers based on very general descriptions of the POPC and PCPC categories. Face and content validities for the POPC and PCPC scales have been evaluated. Differences between baseline and discharge POPC and PCPC scores have been associated with several other indicators of morbidity including length of PICU stay, total hospital charges, discharge care needs and summary measures of severity of illness.12
Unfortunately, even though the POCP and PCPC are statistically correlated with the Stanford-Binet Intelligence Quotients, Bayley Mental Developmental Index scores, Bayley Psychomotor Developmental Index scores, and Vineland Adaptive Behavior Scales scores,13 the correlation of the POCP and PCPC with these tests is not sufficient for use in individual patients. Individuals with the same neuropsychological measure may fall into very different POCP and PCPC categories based on the qualitative assessment of the care giver’s assessment of functioning. Therefore, they are unsuitable for use in decision making or prognostication in individual patients. The current use is primarily limited to large group studies where the numbers of patients makes sophisticated neuropsycho-logical testing impractical due to time and cost.
Economic outcome indicators have been popular outcomes in the medical literature because they are easily measured and appeal to those concerned about health care costs, especially costs for care that is futile or of very low probable benefit. While this emphasis has immediate appeal, efforts to define futility and to link it to high medical costs or the need for rationing have not produced acceptance or consensus among clinicians and policymakers.14 For example, one pediatric ICU study using very broad definitions of medical futility found that relatively small amounts of resources were used for “futile” PICU care.15
Development and Validation of Prognostication
Health professionals, administrators, and policy makers interested in the credible and appropriate use of prognostication scores and methods need to understand, in at least a general fashion, several important issues in the development and validation of scores. Many excellent articles and texts have developed the intricacies of the different theoretical and statistical
First, data elements must be easily defined, easily collected, collected with common frequencies, and relevant to the outcome. This is necessary because data obtained in one institution must have the same definitions and be sampled with similar frequencies in another institution if they are to be generalized to other institutions. In particular, it is best not to include variables such as therapies that can be easily influenced by physician behaviors and could be used to “game” the prognostication method. For example, inclusion of FiO2 in severity scores enables the physician to alter the score by increasing the FiO2 beyond what might be needed. Issues of pain and suffering which may be very relevant, are similarly difficult to quantify. Second, to minimize observation bias, data elements used to create a score should be selected a priori and collected independent of knowledge of the outcome (e.g., prospectively). Third, data should be tested for reliability, usually with one of two methods: intra-observer reliability (consistency of data re-measured by the same person or clinician) or (preferably) interobserver reliability (consistency of data measured by different people). The kappa statistic is a measure of agreement that is scaled to 0 (chance agreement) and 1 (perfect agreement). Several large multi-institutional studies have collected important and useful data but have neglected to insure reliable data collection, severely limiting the utility of the studies’ observations and conclusions, and certainly severely limiting any direct application of the results to individuals.
Individual variables that are candidates for a prognostic model are usually tested separately for statistical association with the outcome (univariate analysis). The set of statistically significant variables from the univariate analyses is subjected to multivariate analysis to assess their relative predictive performance, the standard methodology for developing a multi-element index or score. Multivariate logistic regression is most often used when the outcome is dichotomous (e.g., survival/death). Polychotomous logistic regression enables prediction of more than two outcome states. For example, one pediatric ICU study used this technique to predict the discharge states of being dead, comatose, or not comatose.19 While appealing, very large samples are needed for predicting more than two outcomes. Multivariate linear regression is most often utilized for continuous outcomes (e.g., length of stay); and multivariate linear or quadratic discriminant function analysis is most often used to predict categorical outcomes such as diagnosis.
Care must be taken when developing a score or risk prediction model using multivariate analyses to avoid “overfitting,” the creation of a model that is fitted to idiosyncrasies (noise) of the data rather than to their relevant features. Overfitting is most likely to occur when the number of
variables included in the score is relatively large compared to the number of study subjects or events. A common rule suggests that there be at least 10 outcome events (e.g., deaths) per independent variable.20 Single institution studies with small sample sizes often develop very impressive prediction models that fail to perform well when applied to other institutions or new data sets, in large part because the models are fit to the “noise” of the single institution.
All scoring systems should be validated prior to use. Validation of the score in the population from which the score was derived (internal validation) includes data-splitting, cross-validation, or bootstrapping. In data-splitting, a random portion of the sample is used for the model development (training set) and the remainder is used for the model validation (validation set). Cross-validation is repeated data-splitting, generating many training and validation sets. Bootstrapping involves testing the performance of the model on a large number of sub-samples randomly drawn from the original sample. Validation in an external sample is the most stringent test and should be performed before any prognostic tool is used to guide decisions about individual patients.
A variety of other less traditional methods have been or could be used for score development. For example, neural networks are designed to mimic the performance of the human brain. While many involve aspects of artificial intelligence, none has performed better than standard statistical methods when applied to prognostication.21
Two essential and objective aspects of the validation process are testing for discrimination and calibration.22 Discrimination, or the ability of a model to distinguish between outcome groups, is most often assessed by the area under the receiver operator characteristic (ROC) curve. The area under the curve (AUC) is an expression of the overall discrimination across the range of risk levels and is a good summary measure of predictive ability. The ROC and AUC are difficult concepts to grasp. The AUC is most easily explained with an example. Image that all the patients who survived were in one group and all that died were in another group. If a patient is randomly selected from each outcome group and their prognostic scores compared, an AUC of 0.90 would indicate that 90% of the time, the prognostic score of those who died would be higher than those who lived. A prognostic score with perfect discrimination would always have the survivor score lower than the scores for patients who died. Thus, perfect discrimination between the scores and the AUC would be indicated by an AUC of 1.0. As the prognostication method’s AUC approaches 1.0, it becomes more and more relevant to use in individual patients. Unfortunately, there are no generally accepted discrimination criteria for test use in individual patients, especially if the use is to be involved with decisions of limitations and withdrawals of care. Prognostication methods such as APACHE and PRISM
with AUCs between 0.88 and 0.96 have not been accepted into general clinical use for decision making in individual patients, indicating the need for “perfection” if a score is to be generally accepted.
ROC analysis does not evaluate how the predictor performs across the whole range of risk levels, or how well the predictor is calibrated. Calibration assesses the agreement between the predicted outcomes and actual outcomes over the entire range of risk. That is, a valid score or method should be able to reliably conclude that someone with a higher score is relatively sicker than someone with a lower score. To test the calibration of a scoring system, patients are divided into risk groups, usually deciles from 0% to 10% risk, 11% to 20% risk, . . . , 91% to 100% risk. Equivalently, a priori designation of patient risk categories can be used. For example, risk categories of low (e.g., 0% to 10%), intermediate (11% to 30%), and high (>30%) can be defined a priori and used for the calibration categories. The most accepted method for measuring calibration is the goodness-of-fit statistic proposed by Lemeshow and Hosmer.23
Scoring systems or prediction rules should also pass a standard of clinical reasonableness or sensibility.24 Assessments of sensibility will depend upon the context in which the score is used; for example, a triage score intended for use by emergency medical personnel must be simple, easy to calculate, and not require unstable or irreproducible elements. A mortality risk score such as PRISM is composed of vital signs and laboratory values generally understood to be related to dying. Thus, in the context of critical care where therapies focus on life-support required to treat physiologic dysfunction, a score that uses such variables will more likely be seen as clinically reasonable and sensitive to the user’s focus.
A mortality prediction model developed to guide decisions about individual patients needs to meet the most rigorous standards of thorough investigation, which have been noted. Another very important issue in clinical use of prediction methods is continuous updating. As medicine advances, the relationships among clinical variables, therapies, and outcomes will change, requiring updating of the prognostication methods.
One notable potential problem with current mortality prediction models is that they do not allow for the health professional’s input into the prediction process. It seems plausible that a health care provider’s assessment could improve the model’s performance. In particular, health professionals may be better than statistical methods at accounting for uncommon or rare patient characteristics, unique conditions, or unique combinations of events not contained in sufficient numbers in the data set used to develop the prediction method. Incorporating the health professional’s estimates into a prediction model could also increase the acceptability of prognostic scores in clinical situations.
Bayesian statistical algorithms have been proposed as a means of com-
bining subjective and objective probabilities and possibly improving both prognostic performance and clinician acceptance.25 Briefly, Bayesian statistics adjust a predetermined probability estimate (prior probability) with new data and thereby create a new “updated” probability (posterior probability). For example, the analysis might combine a clinician’s estimate of a patient’s likelihood of survival and the patient’s score from a prognostic model. These methods have been applied to current prediction methods, but have not had substantial impact on either performance or user acceptance. Apparently, the addition of subjective elements into prediction instruments has not alleviated the inherent mistrust individuals have of applying scoring systems to individuals.
Computer-based clinical decision support systems (CDSS) have been proposed to help clinicians in a variety of settings. For example, such systems have been used to calculate cure rates for oncology patients and heparin doses for patients needing anticoagulation. Unfortunately, their application to clinical medicine has been disappointing. A recent systematic review found that CDSS’s can enhance clinical performance for such activities as drug dosing and preventive care, but there have not been any positive impacts on clinical outcomes.26
However, the potential of CDSS systems is tremendous, especially if appropriately large data bases were developed and maintained. For example, if huge data bases could be collected and maintained, analyses could be tailored to the individual patient. Such analyses could focus on the outcomes relevant to the individual patient and be constructed to maximize the accuracy and reliability for a specific person.
EXAMPLES OF PROGNOSTICATION
Scoring systems are a means of quantifying clinical states that are difficult to summarize by other subjective or objective means. These systems are especially valuable in the ICU where subjective impressions of clinical states, severity of illness, and risk of mortality are highly variable.27-34 In pediatrics, there have been major advances since the development of the Apgar score,35 which was devised and is still used to quantify the condition of newborns, to current day scoring systems that can provide mortality risk scores for all ages as well as some scores that are useful for defining severity of illness (e.g., severity of respiratory distress) for other diseases such as croup or asthma.
NICU Mortality Scores
For many years, the Apgar score, birth weight, and gestational age served as standard severity of illness measures for newborns with fair suc-
cess. Unfortunately, the Apgar score may be influenced by region and hospital and has limited use in predicting a newborn’s future health. Birth weight and gestational age have also lost favor because the outcomes for infants with similar birth weights and gestational ages may not be consistent across institutions. Scoring systems attempt to control for biologic heterogeneity but they cannot accommodate unreliable variable measurement. Even more important, the rapid advances in neonatal care have made prognostication methods rapidly outdated as the relationships among biologic factors and outcomes change with new advances.
There are currently two established scoring systems for assessing severity of illness and mortality risk in neonates: the Clinical Risk Index for Babies (CRIB), and the Score for Neonatal Acute Physiology (SNAP). CRIB was developed in the United Kingdom for infants with birth weights less than 1500 grams.36 It is composed of six commonly measured variables collected in the first 12 hours after birth, and the outcome is survival or death. CRIB, however, was developed using data from the pre-surfactant era and in a time when antenatal corticosteroids were not widely used. Both interventions have improved infant survival, altering the relationship of the variables and the CRIB score’s outcome. Therefore, the CRIB mortality prediction model should not be used to provide prognostic probabilities for individual patients.
SNAP II is a second generation, physiology-based score for neonatal severity of illness developed from large samples from the United States and Canada.37 SNAP II has also been modified for use as a mortality prediction model (SPAPPE II) by additional variables including birth weight, small for gestational age, and low Apgar scores.38 Unfortunately, while SNAP scores have very good discrimination, they have been difficult to calibrate, severely limiting their utility because excellent calibration is a primary validity component that must be present before widespread clinical use.
A recent analysis compared multiple neonatal severity scores in a low birth weight infant cohort from 1994-1997.39 None of the neonatal severity scores performed well, implying either deficiencies in their development, or advances in neonatology that have made them out of date. Most important, however, the analyses demonstrated that birth weight was still a very powerful outcome predictor if its predictive potential was accounted for with modern statistical techniques.
Pediatric ICU Mortality Prognostic Scores
Because fewer infants and children die than neonates, it has been more challenging to develop prognostic scores for older children. The two most commonly used severity of illness scores in pediatrics, the Pediatric Risk of Mortality (PRISM) and the Pediatric Index of Mortality (PIM), apply to
critically ill and injured children from full-term newborns through adolescents. PRISM is now a third generation score (PRISM III) developed form over 11,000 patients in 32 PICUs.40,41 It has recently been re-calibrated on over 20,000 patients.42 Mortality predictions can be made either using the first 12 hours (PRISM III-12) or 24 hours (PRISM III-24) of physiologic variables, and descriptive and diagnostic data such as CPR status, operative status, and the presence of importance diagnoses such as cancer. PRISM III has been used for 4 national studies and is routinely used in over 50 PICUs nationally and internationally to evaluate quality of care as well as case-mix adjust administrative data.
The Pediatric Index of Mortality (PIM) was developed on 5,695 patients from only 7 Australian PICUs and one British PICU.43 PIM was developed in response to theoretical concerns about lead-time bias, the concept that what precedes the time period of the severity score may create a bias that is not sufficiently accounted for in the severity models. Thus, models such as PRISM III, use data collected over the first 12 or 24 hours after admission, might be affected by pre-PICU management by influencing the amount of physiologic dysfunction. While this concern has no supporting data, it theoretically applies to all scoring systems. For example, the PIM score uses data collected in the emergency department and first hours of PICU care. However, there are at least 3-fold differences in time spent in emergency departments by patients, a potential lead-time bias because the time for stabilization may differ among health care facilities. Another potential lead-time bias for the PIM score involves the competency of the transport team. A recent study of PIM in the United Kingdom claimed adequate calibration, although statistical analysis was not reported.44 Post hoc statistical analysis demonstrated poor calibration contrary to the authors’ claim.
Other Pediatric Scores
In addition to ICU mortality scores, there are a plethora of scoring systems for specific diagnoses. The Glasgow Coma Scale (GCS) score45 is an example of a score derived from expert opinion as opposed to statistical algorithms. Scores such as the GCS have maintained popularity, despite their only fair statistical performance (construct validity) because their simplicity appeals to users. One respiratory score may be inching its way into clinical use. A clinical croup severity score, after being validated for triage decision making and for measurement of clinical severity, was successfully used to evaluate and implement a critical pathway and to compare outcomes for different croup therapies.46 Other scoring systems for pediatric asthma, pediatric respiratory failure, and meningococcemia have all been described but have not been consistently valid in external samples.47-49 The
Pediatric Trauma Score is a widely used trauma score but its use has primarily focused on pre-hospital care and quality assessment.50
CLINICIANS AS PROGNOSTICATORS
In general, most clinical predictions by physicians are based on clinical intuition or subjective assessments of complex situations. Unfortunately, clinical intuition is fallible. Several studies have cast doubt on the ability of physicians to judge accurately the probability of a variety of clinically important outcomes based on subjective assessments.51,52 When clinicians make predictions about complex situations on a subjective or intuitive basis, they are prone to personal biases and other problems associated with heuristics or “rules of thumb.” For example, a “ value” bias reflects influence by a patient’s values (e.g,. social, political similarities) and a “reverse ego bias” reflects a physician’s belief that his or her patients will do better than average.53,54 Physicians often make intuitive judgements using the “availability heuristic” or the “representativeness heuristic.” The “availability heuristic” has the weakness of judging the probability of an outcome according to how easily one remembers patients who had this outcome, a judgment process that is biased by both the physician’s experience and the physician’s recall. The “representativeness heuristic,” refers to judging the probability of an outcome according to a patient’s resemblance to a previous patient. The “representativeness heuristic” neither accounts for a suitable number of variables nor incorporates the spectrum of possible clinical outcomes.
Physicians’ accuracy in estimating mortality risk for patients admitted to ICUs has been variable.55-58 Generally, clinically experienced physicians perform better than less experienced physicians but there are even discrepancies between physicians of equal clinical experience.59 Physician prediction performance may also depend upon the patient’s disease or severity of illness.60 Importantly, there are no data about how well physicians can predict the probability of survival for seriously ill patients at the time that triage decisions must be made, for example, when a patient presents to the emergency department. Certainly, the percentages of patients in pediatric and neonatal ICUs that receive only monitoring services is high, implying a lack of precision in estimating severity of illness and therapeutic needs.61,62
Single pediatric studies of the prediction performance of nurses and physicians are available for neonatal and pediatric ICUs. Stevens et al. evaluated the performance of physicians and nurses in 544 patients (21 deaths) in two neonatal ICUs.63 Clinical estimates of severity of illness were done with a 5 point Likert scale from low risk to “virtually certain death, now or delayed.” In general, mortality risk increased with increasing clinical estimates of severity of illness. While there was no statistical analysis of
calibration, 40% of the physician predictions of “virtually certain death, now or delayed” and 33% of the RN predictions for the same outcome were in error. This tendency for health professionals to overestimate severity of illness was confirmed by comparing a sub-sample of clinical mortality risk estimates to those obtained from the SNAP score. Compared to the SNAP score, the clinicians overestimated mortality risk by 1.5-fold. Discrimination performance of both neonatal physicians and nurses was very good. The ROC for physicians was 0.85, and for nurses was 0.93, equivalent to the SNAP score of 0.94.
Only one study has evaluated the prediction performance of physicians at different experience levels and nurses in a PICU and compared the performance of these health care professionals with the same statistical methods used to evaluate prognostication scores.64 In this study of 642 patients of whom 36 patients died, predictions were made after the first 10 hours and 24 hours of care by bedside nurses, residents, critical care fellows, and critical care attendings. Because predictions were made after 10 hours, “simple” patients with very short stays either because they were very healthy or obviously dying were excluded. Each health care professional provided estimated mortality risks from 0% to 100%. Agreement among care provider groups was measured with the kappa statistic. Prediction performance for each provider group was evaluated using similar performance criteria as described for quantitative methods. Discrimination was analyzed by the area under the receiver operating characteristic (ROC) curve. Calibration over the entire range of mortality risks was analyzed by the Hosmer-Lemeshow goodness of fit.
The results for the predictions after the first 10 hours are shown in Table B-1 and illustrate a surprisingly good comparison of clinical outcome prediction to quantitative outcome prediction. Discrimination was excellent for all groups as judged by the area under the ROC curve and it was best for attendings and worst for critical care fellows. The discrimination of attendings, residents and nurses approximated the discrimination of PRISM III, and was superior to PIM. These results differ sharply from the neonatal ICU study of Stevens et al. and probably indicate that there is a wide variability of the prediction performance of health care professionals among different institutions. It is also likely that in the study PICU, the same PICU where the PRISM III score was developed, more education and feedback in outcome prediction has taken place than in most ICUs, and this is reflected in the excellent prediction performance of the health care providers.
These data suggest that there is wide variability in the prognostication performance of physicians but that excellent performance is possible. Unfortunately, there are no training programs for physicians to specifically improve their prognostication performance. And personal habits of physicians do not generally emphasize prognostication “learning.” For example,
TABLE B.1 Discrimination and Agreement of Health Care Providers
most physicians do not routinely record their prognostic impressions and then review them after the outcome is known. Therefore, they miss the opportunity to learn from their successes and their mistakes.
Calibration of predictions was good for attendings and fellows but statistically different than predicted for residents and nurses. Agreement as measured by the kappa statistic was statistically “good” for all groups (Table B.1); however, the measured kappa statistics indicated a level of disagreement that, if present in a high-risk group, would be sufficiently large to cast doubt on the reliability of the prediction to guide decisions about withdrawals and limitations of life sustaining medical interventions.
Table B.2 shows the sensitivity (correctly predicted deaths/total deaths), specificity (correctly predicted survivors/total survivors), false positives (incorrectly predicted deaths/total deaths), false negatives (incorrectly predicted survivors/total survivors), positive predictive value (correctly predicted deaths/total predicted deaths) and negative predictive value (correctly predicted survivors/total predicted survivors). In clinical scenarios of withdrawals and limitations of life-sustaining medical interventions, the false positives (incorrectly predicted deaths) are the most important error.
False positive error rates were very high. Even attendings, who were the best performing group, had a false positive error rate of over 25% compared to actual outcomes. Residents and nurses had false positive error rates over 45%. The best performance of health care professionals occurred at the highest and lowest mortality risk ranges while the worst prediction performance was in the intermediate ranges. That is, very healthy patients were very reliably predicted and patients easily perceived to be terminal were also reliably predicted. Weighting predictions by the certainty of the prediction (data not shown) improved the prediction performance slightly but not substantially enough to change the performance assessment.
Compared to the first 10 hours, predictions after the first 24 hours were not improved. This is consistent with the finding of others who also
TABLE B.2 Sensitivity, Specificity, False Positives, False Negatives, Positive Predictive Value (PPV) and Negative Predictive Value (NPV) of PICU Healthcare Providers
questioned the ability of clinicians to integrate dynamic, time-related data.65 That is, physicians do not respond quickly to changes in clinical events and incorporate them into revised severity of illness assessments.
As noted earlier, the available studies indicate that physicians find giving prognostic information stressful and they often postpone prognostications until questioned by a patient or family member. One study surveyed internists and found that over 50% reported that they had received inadequate training in prognostication while less than 10% felt that training was deficient in training in diagnosis and therapy.66 The stressfulness of making prognostic declarations was related to many factors. Physicians who reported inadequate training in prognosis were 60% more likely to find giving prognostic information “stressful” than physicians who rated their prognostication training as adequate. Prognostic certainty did not necessarily alleviate the stressfulness of process; the sense that physicians felt their colleagues would judge them harshly if they made an incorrect prediction was most important in this group. Stress with prognostication was not related to personal characteristics such as sex or years in practice.
Thus, there are important limitations in clinicians’ abilities to make accurate and reliable mortality predictions based on either clinical judgment or statistical algorithms. This is especially true under triage situations early in the course of an illness when the clinical course and outcome are not sufficiently defined. Under these circumstances, clinicians, administrators, and policymakers should be very cautious about using mortality predictions to determine whether interventions such as CPR or mechanical ventilation are used or withheld.67
Can providers be educated to provide better predictions? Can individual physicians be identified who are better able to evaluate multi-dimensional, data intensive medical situations? This would be analogous to the way pilots are selected in part on their ability to make evaluations in similarly complex situations. No studies have investigated this potential.
At present, the odds are stacked against physicians performing as excellent prognosticators under situations of life and death. Medical schools and textbooks do not provide the appropriate training and information, for example, a grounding in concepts and relevant statistical tools. Physicians are not taught the appropriate skill sets to constantly record and evaluate their performance. For example, physicians do not routinely record their prognostications and review their performance, thus getting immediate and valuable feedback to be used to improve their abilities. They are sometimes rewarded for using heuristics when they are correct, but that is merely a statistical chance, not necessarily a validation of a heuristic. Remarkably, there are few if any reports of educational efforts to improve physicians’ prognostication skills, even though it is central to physicians’ ability to choose appropriate therapies for physiologic dysfunction as well as to counsel patients and families.
Families are often left with firm opinions by physicians without knowing how much faith to place in these predictions. They may give more credibility to the physician’s prediction than is warranted. If proven wrong, their trust in their physician may be reduced. More importantly, when an active decision concerning withdrawals and limitations of care is needed, their trust (or lack of trust) in their physician’s prognostic abilities may be misplaced.
UTILITY OF PROGNOSTICATION SCORES FOR INDIVIDUAL PATIENTS
There has been an increased interest in the use of outcome prediction models in providing decision support for individual patients.68-75 In general, these applications have focused on mortality prediction and on resource use for individual patients too healthy to benefit76or too sick to benefit77-80 from intensive care services. Early identification of patients in the ICU for whom further curative, life-prolonging, or life-sustaining therapies are futile or very unlikely to be beneficial could help with difficult decisions, obviate undue patient suffering, and help to direct scarce resources to more cost effective uses.81,82
Most individuals and societies have cautiously approached the issue of quantitative prognostication for individual patients with such scores as
PRISM. Most recognize that individual clinicians have the responsibility and accountability for decision making. Prognostication scores can only add information, not dictate decisions, especially decisions regarding withdrawal or limitation of life-sustaining medical interventions. The mystique of a mathematical model developed from tens of thousands of patients cannot substitute for the intricacies of knowing an individual patient. And perhaps most important, the most relevant data, such as cognitive impairment or physical disability, may not be accurately predicted by such a model. It may be seductive to use the information that seems objective and is readily available, even if it is not quite the relevant information. For example, the Society of Critical Care Medicine’s Ethics Committee has clearly warned that prognostication scores should be used with caution in individuals, emphasizing that probability of death is only one of the factors pertinent to decision making.83
In actuality, we know very little about how decisions are made in ICUs and how objective risk assessments using prognostication scores would serve decision making. Emanuel and Emanuel categorized four models of decision for adult patients that emphasize the roles and relationships among physician and patient dependency, expertise and power: parental model, informative model, interpretive model, and deliberative model.84 We know little about the use and effectiveness of prognostication scores and methods in adult ICUs,85 and we know even less about their utility in the context of the decision making models such as those delineated by Emanuel and Emanuel. Our understanding of decision making models in pediatric ICUs is even more limited than adult ICUs. The value of prognostication scores may range from very useful to destructive and their use will require a much better understanding of the of the kinds of decisions and the decision making process.
Studies of Utility in Individuals
Scoring systems may help identify “potentially ineffective care,” or isolate patients admitted to the ICU with a negligible chance of survival in whom further care would not be beneficial.86,87 The utility of objective prognostication scores will depend in large part on the size of the data base, the number of patients in relevant sub-samples, the confidence level or clinical certainty required by physicians for decision making, and the predicted clinical outcome range given the required certainty level, as well as the intended application.
Among pediatric measures, only the PRISM score has received in depth evaluation for use in individuals. One such evaluation demonstrated the problem in using outcome prediction scores to guide care of individual children.88 First, the numbers of patients isolated by modern prognostica-
tion scores such as PRISM III for whom life-sustaining medical interventions are “futile” are very small. In a sample of 10,608 patients from 32 pediatric ICUs of whom 571 died, the observed survivors and sample sizes for the three highest PRISM III scores of PRISM III >28, PRISM III >35, and PRISM III >42 were 10/158, 3/57 and 0/21, respectively. Thus, in a multi-institutional sample of over 10,000 patients, only one small group of 21 patients (0.19% of the sample) could be detected with zero survivors. Even though the discrimination power of PRISM III-24 is equal to or better than any prognostication score, it did not isolate a large proportion of deaths in a patient group where all patients died (no false positives).
Second, there are significant unresolved issues of clinical certainty and outcome ranges when predictions for individual patients are based on statistical data. The statistical concepts of confidence intervals and confidence levels must be thoroughly understood in their relationship to the clinical concepts of survival ranges and clinical certainty if they are to help guide for individuals. In a statistical sense, if a model produces a survival prediction with an associated confidence interval calculated at a 95% confidence level, the accepted interpretation is that upon repeated sampling with the same sample size, 95% of the samples will contain the true survival mean within the samples’ confidence limits. For most clinicians, this interpretation does not make practical sense. A typical clinician’s interpretation of a 95% confidence interval is that there is a 95% chance that the true survival rate is within the stated confidence interval. Consequently, the confidence interval is interpreted as the estimated clinical outcome range, the range of actual outcomes estimated by past experience that has a clinical certainty of 95%. Thus, both the clinical survival range and the clinical certainty can be evaluated in a statistical sense by the confidence interval and the confidence level. For example, the statistical conclusion that the mortality rate and 95% confidence interval is 80% ± 5% has clinical interpretation of a clinical outcome range of 75% to 85% with 95% clinical certainty.
For individual patients, Marcin et al. asked the question, “What are the survival rates and associated survival ranges (confidence intervals) at different clinical certainty levels (confidence levels)?” for patients in very poor outcome groups noted above.89 For any given sample size, the analytic preference for increased clinical certainty (higher confidence levels) will be traded off against wider clinical outcome ranges (wider confidence intervals). Conversely, a narrower outcome range (narrower confidence interval) potentially needed for decision making can only be obtained by decreasing the clinical certainty (decreasing the confidence level) of the prediction.
This issue was evaluated by calculating the confidence intervals (outcome ranges) at confidence levels (clinical certainty) of 70%, 80%, 90%, 95%, and 99%. Table B.3 illustrates the trading relationship between clini-
TABLE B.3 Maximum Survival Rates at Different Certainty Levels (Maximum survival rate is equal to the upper bound of the exact one-tailed confidence interval based on a binomial probability distribution.) The observed survival rates were 10/158 (6.3%), 3/57 (5.3%), and 0/21 (0%) for PRISM III-24 >28, >35, and >42, respectively.
Certainty level (%)
cal certainty (confidence level) and survival ranges (confidence interval). For example, although there were no survivors with a PRISM score >42, predicting death with a clinical certainty of 99% (99% confidence interval) would result in a wide survival range of 0% to 19.7%, a very wide potential survival range for a sample with a measured survival of 0% and one with an upper limit of survival that would make clinicians very uncomfortable in withdrawing care. However, if the physician is willing to accept a clinical certainty (confidence level) of only 70%, then the survival probability range is much narrower and lower at 0% to 5.6%. It may seem paradoxical as the table shows that the outcome range is narrower at a certainty of 99% for the PRISM >28 (maximum survival rate of 12.3% versus 19.7%). This occurs because there are more patients in that sample with PRISM III >28 than in the sample with PRISM III scores >42.
Clinical futility, one of the situations in which prognostication scores have been envisioned for use, often requires finding sample or prediction groups where there are zero survivors. Yet, this example illustrates the difficulties in obtaining and evaluating samples or prediction groups where there are zero survivors. In the examples above, most parents and physicians would continue medical care despite the observation that no one had ever survived because there is a substantial statistical probably of survival. That is, in the example given above, there is almost a 1 in 5 chance that the true outcome will be survival if we require a 99% certainty of making that judgment.
Of course, there is a limit to the application of statistical logic to clinical decision making. In statistical analyses, the trade-off between confidence interval width and confidence level is clear. The analogy to the outcome ranges and clinical certainty, while close, is not always logical. For
example, imagine trying to start cars that do not have batteries. If after turning the ignition 21 times and finding that none of the engines start, what is the outcome range and certainty level that the next car without a battery will start? Statistically this is identical to the outcome problem illustrated above. With a 99% clinical certainty, the chance of the next car starting could be as high as 19.7%. This, of course, is not the reality. The certainty of the outcome of the next attempt to start a battery-less car is quite clear, and the outcome range is very narrow. This example, which exploits substantial real-world experience, helps define the limits of clinical decision making with quantitative methods.
Some of the confusion concerning the use of prognostication scores for individuals comes from the potential application of prognostication scores to health policy or societal decisions. For example, should insurance coverage be refused for the use of a particular therapy for patients with certain medical characteristics associated with a very low likelihood of benefit? This question is very similar to the problem above, but the results are very different. For this perspective the authors queried the PRISM data base with the “same” question, but from a health policy perspective: “What is the maximum error rate of a health care policy which limits therapies for patients with PRISM III scores exceeding a very high threshold, and how do these maximums (based on confidence intervals or estimates of survival ranges) change as the confidence levels (as estimates of clinical certainty) change?”89 In this case, the use of a score to guide rationing decisions at the collective level may have more utility because the perspective changes from an error rate for an individual patient to an error rate for all survivors. Since the number of all survivors is much larger, the error rate is much, much smaller. This is seen in Table B.4.
At a 95% confidence level (clinical certainty), the maximum error rate was 29 per 100,000 PICU survivors, an error rate of only 0.029% (Table B.4). Changes in the certainty level in this example produce relatively minor changes in the outcome range. Even for a 99% certainty level, the error rate rises only to 46 per 100,000, an error rate of 0.049%. From a health policy perspective, limiting certain interventions for all patients with a PRISM III-24 score of >42 might be viewed as having an acceptable error rate.
From the health policy perspective, the risks of making an error by limiting resources to all patients with a very high PRISM III score are relatively similar to many of the risks of daily living. For example, the risk of dying in an accident in the next year is 36 per 100,000, being murdered in the next 2 years is 20 per 100,000 of dying in a work related accident in the next year is 9 per 100,000.90
TABLE B.4 Maximum Error Rates at Different Certainty Levels if All Patients with a PRISM III-24 Score of >42 Were Discharged on Day 2 of PICU Care (Maximum error rate is equal to the upper bound of the exact one-tailed confidence interval based on a binomial probability distribution.)
Certainty level (%)
Maximum error rate
46 per 100,000
29 per 100,000
23 per 100,000
16 per 100,000
12 per 100,000
CONCLUSIONS: CAN PROGNOSTICATION SCORES BE OF SOME ASSISTANCE IN END-OF-LIFE CARE?
There are many obstacles in applying prognostication scores to end-of-life decision making. This manuscript has tried to examine the major issues based on the available pediatric data and the adult data that can be extrapolated to pediatrics. Extrapolation of adult data to pediatrics is especially uncertain because the decision making models and the ethical constructs may be very different.
Health care professionals need help in prognostication. First, end-of-life prognostication is stressful and physicians often avoid it. Most physicians believe they have had inadequate training and physicians do not routinely self-assess their predictions to improve. Second and most important, pediatric health care professionals make errors. Even in academic neonatal and pediatric ICUs where staff feels the most comfortable with end-of-life decisions and where experience is maximized because of the number of deaths, physicians and nurses make substantial errors in predicting death based on subjective judgment. The false positive rate for predicting death ranges from approximately 25% to 50% in sickest patient groups, clearly a worrisome error rate. This is reflected in the observation that many children in whom care is limited or withdrawn actually live. In a study of withdrawals and limitations of care in 16 PICUs, 7 of 83 patients who had care withdrawn or limited due to “imminent death” survived to hospital discharge. Of these, 5 had do not resuscitate orders, 1 had an additional limitation, and 1 had active withdrawal of care.91
Can prognostication scores help clinicians and families? Some families
may be ready for better prognostication methods; almost 30% of families of children who died in 3 Boston PICUs after forgoing life-support felt that there was insufficient information concerning their child’s chances for survival. Yet the chance of “getting better” was the second most important factor to families, with 78% rating it as very important.92 While these data suggest that quantitative prognostication could be useful, there has not been a single convincing study that prognostication scores will help in the end-of-life process. And, experience suggests that physicians shy away from generating and using probabilistic data. For example, for adult patients, although objective methods of diagnosis of myocardial infarction have long been demonstrated to be superior to clinical judgment alone, physicians have never gravitated toward using the worksheets and doing the computations required for widespread use of such predictive instruments.93
Second, prognostic scores derive their strength from the statistical methodology used to develop and validate them. They are, however, subject to the same limitations as all statistical processes. And, as evaluated above with the concepts of certainty and outcome ranges, there is not always a clear clinical counterpart to the statistical concepts. The acceptance of prognostication scores by health care providers will require a greater understanding of the statistical concepts and how to apply them to clinical situations.
Third, it is not clear that prognostication scores will fit comfortably into physician-patient relationships. The relationship of patient and physician varies greatly depending on the characteristics of individual physicians and individual patients or families.94 Patient values may range from fixed to changeable, from harmonious to conflicting. The physician’s sense of his or her obligation may range from information giver to interpreter to persuader to advocate for the patient or family or both. The patient’s sense of autonomy may be very strong to child-like and may change over time, for example, as an illness progresses. Similarly, patients’ values regarding the use of life-sustaining care when illness is not far advanced may change as illness progresses. And, the patient-physician relationship may rapidly shift as the medical situation changes. For example, in an emergency situation, the relationship may be paternalistic when decision making is rapid without time for measured consideration of the therapeutic options and then may evolve into one that is more deliberate or informative as more time passes and the decision makers are able to discuss medical issues and get to know the patient’s values.
Notwithstanding these cautions, physicians may find prognostication scores useful in some situations. Perhaps they can fit some physicians and patients by providing a measure of objectivity to decision making. Confirming an impression of a poor prognosis by using objective data and a prognostic score may provide help and reassurance to some clinicians.
Routine availability of prognostic information might, over time, somewhat modify the inclinations of some physicians to offer patients and families overly optimistic prognostic assessments that discourage adequate preparation for death. However, for some physicians, patients, and families, prognostication scores may be destructive. Particularly when the patient is a child, physicians or families may require a very high degree of certainty that there is “no hope.” For those, a prognostic score will never provide this degree of certainty. A prognostication score actually might serve to legitimize the persistence with care giving because the statistics could offer “some hope” based on the statistical computations never reaching an absolute conclusion.
In any case, the effective use of prognostication scores will remain limited until huge data bases can be collected that have a sufficient number of patients who can be matched or approximately matched to the individual patient. We will need our largest collective experience to serve the individual in issues of life and death decision making. There are a variety of incentives for collecting these databases. They could be used for a variety of purposes including quality a care assessment, and numerous health services and outcomes studies. But their most exciting potential would be to provide a huge objective experience that could be distilled from hundreds of thousands patients and applied to the single individual.
1Drickamer MA, Lee MA, Ganzini L. Practical issues in physician-assisted suicide. Ann Intern Med 1997;126:146-151.
2Teno JM, Murphy D, Lynn J. Prognosis-based futility guidelines: does anyone win?J Am Geriatr Soc 1994;42:1202-1207.
3Schneiderman LJ, Jecker NS. Futility in practice. Arch Intern Med 1993;153:437-441.
4Carron AT, Lynn J, Patrick K. End-of-life care in medical textbooks. Ann Internal Med 1999;130:82-86.
5Hanson, CW, Marshall, BE. Artificial intelligence applications in the intensive care unit. Crit Care Med 2001;29:427-435.
6Tversky A, Kahneman D. Availability: A heuristic for judging frequency and probability. In: Kahneman D, Slovic P,Tversky A, eds.Judgment under Uncertainty: Heuristics and Biases.New York: Cambridge University Press. 1982:163-78.
7Jennings D, Amabile T, Ross L. Informal covariation assessment: Data-based versus theory-based judgments. In: Kahneman D, Slovic P, Tversky A, eds.Judgment under Uncertainty: Heuristics and Biases. New York: Cambridge University Press. 1982:211-30.
8Morris AH. Developing and implementing computerized protocols for standardization of clinical decisions. Ann Intern Med 2000;132:373-383.
9Christakis, NA, Iwashyna TAB. Attitude and self-reported practice regarding prognostication in a national sample of internists. Arch Intern Med 1998; 158: 2389-2395.
10Randolph AG, Zollo MB, Egger JM, et al. Variability in physician opinion on limiting pediatric life support. Pediatrics 1999;103:e46.
11Fiser DH. Assessing the outcome of pediatric intensive care. J Pediatr 1992;121:69-74.
12Fiser DH, Tilford JM, Roberson PK. Relationship of illness severity and length of stay to functional outcomes in the pediatric intensive care unit: A multi-institutional study. Crit Care Med 2000;2:1173-1179.
13Fiser DH, Long N, Roberson PK, et al. Relationship of pediatric overall performance category and pediatric cerebral performance category scores at pediatric intensive care unit discharge with outcome measures collected at hospital discharge and 1- and 6-month follow-up assessments. Crit Care Med 2000;28:2616-2620.
14Rappaport J, Teres D, Lemeshow S. Can futility be defined numerically?Crit Care Med 1998;26:1781-1782.
15Sachdeva RC, Jefferson LS, Coss-Bu J, et al. Resource consumption and the extent of futile care among patients in a pediatric intensive care unit setting. J Pediatr 1996;128:742-747.
16Marcin JP, Pollack MM. Review of the methodologies and applications of scoring systems in neonatal and pediatric intensive care. Ped Crit Care Med 2000;1:20-27.
17Ruttimann UE. Statistical approaches to development and validation of predictive instruments. Crit Care Clinics 1994;10:19-35.
18Lemeshow S, Le Gall JR. Modeling the severity of illness of ICU patients: A systems update. JAMA. 1994;272:1049-1055.
19Ruttimann UE, Pollack MM, Fiser DH. Prediction of three outcome states from pediatric intensive care. Crit Care Med 1996;24:78-85.
20Marcin J, Pollack MM. Review of the methodologies and applications of scoring systems in neonatal and pediatric intensive care. Ped Crit Care Med 2000;1:20-27.
21Hanson CW, Marshall BF. Artificial intelligence applications in the intensive care unit. Crit Care Med 2001;29:427-435.
22Lemeshow S, Le Gall JR. Modeling the severity of illness of ICU patients: A systems update. JAMA 1994;272:1049-1055.
23Hosmer DW, Lemeshow S. Applied Logistic Regression. John Wiley and Sons, New York. 1989.
24Laupacis A, Sekar N, Stiell IC. Clinical prediction rules. A review and suggested modifications of methodological standards. JAMA 1997;277:488-494.
25Brannen AL, Godfrey LJ, Goetter WE. Prediction of outcome from critical illness. A comparison of clinical judgment with a prediction rule. Arch Intern Med 1989;149:1083-1086.
26Hunt DL, Hayes HR, Hanna SE, Smith K. Effects of computer-based clinical decision support systems on physician performance and patient outcomes; a systematic review. JAMA 1998;280:1339-1346.
27Perkins HS, Jonsen AR, Epstein WV. Providers as predictors: Using outcome predictions in intensive care. Crit Care Med 1986;14:105-110.
28Kruse JA, Thill-Baharozian MC, Carlson RW. Comparison of clinical assessment with APACHE II for predicting mortality risk in patients admitted to a medical intensive care unit. JAMA 1988;260:1739-1742.
29Knaus WA, Wagner DP, Lynn J. Short-term mortality predictions for critically ill hospitalized adults: Science and ethics. Science 1991;54:389-394.
30Poses RM, Bekes C, Copare FJ, et al: The answer to “What are my chances, doctor?” depends on whom is asked: prognostic disagreement and inaccuracy for critically ill patients. Crit Care Med 1989;17:827-833.
31Poses RM, Bekes C, Winkler RL, et al. Are two (inexperienced) heads better than one (experienced) head? Averaging house officers’ prognostic judgments for critically ill patients. Arch Intern Med 1990;150:1874-1878.
32Stevens SM, Richardson DK, Gray JE, et al. Estimating neonatal mortality risk: An analysis of clinicians’ judgments. Pediatrics 1994;93:945-950.
33McClish DK, Powell SH. How well can physicians estimate mortality in a medical intensive care unit?Med Decis Making 1989;9:125-132.
34Dawes RM, Faust D, Meehl PE. Clinical versus actuarial judgment. Science 1989; 243:1668-1674.
35Apgar V. A proposal for a new method of evaluation of the newborn infant. Cur Res Anes Anal 1953; Jul-Aug:260-267.
36The International Neonatal Network. The CRIB (clinical risk index for babies) score: A tool for assessing initial neonatal risk and comparing performance of neonatal intensive care units. Lancet 1993;342:193-198.
37Richardson DK, Gray JE, McCormick MC, et al. Score for neonatal acute physiology: A physiologic severity index for neonatal intensive care. Pediatrics 1993;91:617-623.
38Richardson DK, Phibbs CS, Gray JE, et al. Birth weight and illness severity: Independent predictors of neonatal mortality. Pediatrics 1993;91:969-975.
39Pollack MM, Koch MA, DA, et al. A comparison of neonatal mortality risk prediction models in very low birth weight infants. Performance of severity of illness models in very low birth weight infants. Pediatrics 2000 May;105:1051-1057.
40Pollack MM, Ruttimann UE, Getson PR. Pediatric risk of mortality (PRISM) score. Crit Care Med 1988;16:1110-1116.
41Pollack MM, Patel KM, Ruttimann UE. PRISM III: An updated Pediatric Risk of Mortality score. Crit Care Med 1996;24:743-752.
42Pediatric intensive care unit evaluations. www.picues.org.
43Shann F, Pearson G, Slater A, et al. Paediatric index of mortality (PIM): A mortality prediction model for children in intensive care. Intensive Care Med 1997;23:201-207.
44Pearson GA, Stickley J, Shann F. Calibration of the pediatric index of mortality in UK paediatric intensive care units. Arch Dis Child 2001;84:125-128.
45Jennett B, Bond M. Assessment of outcome after severe brain damage. Lancet 1975; 1:480-484.
46Jacobs S, Shortland G, Warner J, et al. Validation of a croup score and its use in triaging children with croup. Anaesthesia 1994;49:903-906.
47Wood DW, Downes JJ, Lecks HI. A clinical scoring system for the diagnosis of respiratory failure. Preliminary report on childhood status asthmaticus. Am J Dis Child 1972; 123:227-228.
48Timmons OD, Havens PL, Fackler JC. Predicting death in pediatric patients with acute respiratory failure. Pediatric Critical Care Study Group. Extracorporeal Life Support Organization. Chest 1995;108:789-797.
49Hachimi-Idrissi S, Corne L, Ramet J. Evaluation of scoring systems in acute meningococcaemia. Eur J Emerg Med 1998;5:225-230.
50Kaufmann CR, Maier RV, Rivara FP, et al. Evaluation of the Pediatric Trauma Score. JAMA 1990;263:69-72.
51Christensen-Szalanski JJ, Bushyhead JB. Physicians use of probabilistic information in a real clinical setting. J Exp Psychol Hum Percep Perform 1981;7:928-935.
52Eisenberg RL, Heineken P, Hedgcock MW, Federle M, Goldberg HI. Evaluation of plain abdominal radiographs in the diagnosis of abdominal pain. Ann Intern Med 1982;97:257-261
53Poses RM, Anthony M. Availability, wishful thinking, and physicians’ diagnostic judgements for patients with suspected bacteremia. Med Decis Making 1991;11:159-168.
54Poses RM, Cebul RD, Collins M, Fager SS. The accuracy of experienced physicians’ probability estimates for patients with sore throats. JAMA 1985;254:925-929.
55Stevens SM, Richardson DK, Gray JE, Goldmann DA, McCormick MC. Estimating neonatal mortality risk: An analysis of clinicians’ judgments. Pediatr 1994;93:945-950.
56McClish DK, Powell SH. How well can physicians estimate mortality in a medical intensive care unit?Med Decis Making 1989;9:125-132.
57Poses RM, Bekes C, Winkler RL, Scott WE, Copare FJ. Are two (inexperienced) heads better than one (experienced)head?Arch Intern Med 1990;150:1874-1878.
58Kruse JA, Thill-Baharozian MC, Carlson RW. Comparison of clinical assessment with APACHE II for predicting mortality risk in patients admitted to a medical intensive care unit. JAMA 1988;260:1739-1742.
59Poses RM, Bekes C, Copare FJ, Scott WE. The answer to “What are my chances, Doctor?” depends on whom is asked: Prognostic disagreement and inaccuracy for critically ill patients. Crit Care Med 1989;8:827-833.
60Perkins HS, Jonsen AR, Epstein WV. Providers as predictors: Using outcome predictions in intensive care. Crit Care Med 1986;14:105-110.
61Gray JE, McCormick MC, Richardson DK. Determinants of hospital resource use among normal birthweight NICU patients. Ped Res 1993;33:26A.
62Pollack MM, Ruttimann UE, Glass NL, Yeh TS. Monitoring patients in pediatric intensive care. Pediatrics1985;76:719-724.
63Stevens SM, Richardson DK, Gyay JE, et al. Estimating neonatal mortality risk: An analysis of clinicians’ judgements. Pediatrics 1994;93:945-950.
64Marcin JP, Pollack MM, Patel KM, et al. Prognostication and certainty in the pediatric intensive care unit. Pediatr 1999;104:868-873.
65Poses RM, Bekes C, Copare FJ, Scott WE. What difference does two days make? The inertial of physicians’ sequential prognostic judgments for critically ill patients. Med Decis Making 1990;12:6-14.
66Christakis NA, Iwashyna TJ. Attitude and self-reported practice regarding prognostication in a national sample of internists. Ann Intern Med 1998;158:2389-2395.
67Poses RM, Smith WR, McClish DK, Huber EC, Clemo FLW, Schmitt BP, Alexander-Forti D, Racht EM, Colenda CC III, Centor RM. Physicians’ survival predictions for patients with acute congestive heart failure. Arch Intern Med 1997;157:1001-1007.
68Chang RWS, Lee B, Jacobs S. Accuracy of decisions to withdraw therapy in critically ill patients: Clinical judgment versus a computer model. Crit Care Med 1989;243:1668-1674.
69Mamelak AN, Pitts LH, Damron S. Predicting survival from head trauma 24 hours after injury: A practical method with therapeutic implications. J Trauma 1996;41:91-99.
70Hamel MB, Lee G, Teno J, et al. Identification of comatose patients at high risk for death or severe disability. JAMA 1995;273:1842-1848.
71Rogers J, Fuller HD. Use of daily acute physiology and chronic health evaluation (APACHE) II scores to predict individual patient survival rate. Crit Care Med 1994;22: 1402-1405.
72SUPPORT Investigators. A controlled trial to improve care for seriously ill hospitalized patients. JAMA 1995;274:1591-1598.
73Atkinson S, Bihari D, Smithies M, Daly K, Mason R, McColl I. Identification of futility in intensive care. Lancet 1994;344:1203-1206.
74Watts CM, Knaus WM. The case for using objective scoring systems to predict intensive care unit outcome. Crit Care Med 1994;22:73-89.
75Goldman L, Cok EF, Johnson PA, Brand DA, Rourn GW, Lee TH. Prediction of the need for intensive care in patients who come to emergency departments with acute chest pain. N. Engl Med 1996;334:1498-1504.
76Pollack MM, Getson PR. Pediatric critical care cost containment: Combined actuarial and clinical program. Crit Care Med 1991;19:12-20.
77Chang RWS, Lee B, Jacobs S. Accuracy of decisions to withdraw therapy in critically ill patients: Clinical judgment versus a computer model. Crit Care Med 1989;243:1668-1674.
78Mamelak AN, Pitts LH, Damron. Predicting survival from head trauma 24 hours after injury: A practical method with therapeutic implications. J Trauma 1996;41:91-99.
79Hamel MB, Lee G, Teno J, et al. Identification of comatose patients at high risk for death or severe disability. JAMA 1995;273:1842-1848.
80SUPPORT Investigators. A controlled trial to improve care for seriously ill hospitalized patients. JAMA 1995;274:1591-1598.
81Murray LS, Teasdale GM, Murray GD, et al. Does prediction of outcome alter patient management?Lancet 1993;341:1487-1491.
82Knaus WA, Rauss A, Alperovitch A. Do objectives estimates of chances for survival influence decisions to withhold or withdraw treatment?Med Decision Making 1990;10:163-171.
83The Ethics Committee of the Society of Critical Care Medicine. Consensus statement regarding futile and other possibly inadvisable treatments. Crit Care Med 1997;25:887-91.
84Emanuel DJ, Emanuel LL. Four models of the physician-patient relationship. JAMA 1992;267:2221-2226.
85Cook D. Patient autonomy versus paternalism. Crit Care Med 2001;29:N24-25.
86Esserman L, Belkora J, Lenert L. Potentially ineffective care. A new outcome to assess the limits of critical care. JAMA 1995;274:1544-51.
87Marcin JP, Pollack MM, Patel KM, et al. Decision support issues using a physiology based score. Intensive Care Med 1998;24:1299-1304.
88Marcin J, Ruttimann U, Patel KM, Pollack MM. Decision support issues using a physiology-based score. Intensive Care Med 1998;24:1299-1304.
89Marcin JP, Pollack MM, Patel KM, et al: Decision support issues using a physiology based score. Intensive Care Med 1998;24:1299-1304.
90Redelmeir DA, Shafir E. Medical decision making in situation that offer multiple alternatives. JAMA 1995;273:302-305.
91Levetown M, Pollack MM, Cuerdon TT, Ruttimann UE, Glover JJ. Limitations and withdrawals of medical interventions in pediatric critical care. JAMA 1994;272:1271-1275.
92Meyer EC, Burns JP, Griffith JL, Truog RD. Parental perspectives on end-of-life care in the pediatric intensive care unit. Crit Care Medicine 2002;30:226-231.
93Corey GA, Merensten HJ. Applying the acute ischemic heart disease predictive instrument. J Fam Pract 1987;25:127-133.
94Emanuel EJ, Emanuel LL. Four models of the physician-patient relationship. JAMA 1992;267:2221-2226.