R. Clifton Bailey
Health Standards and Quality Bureau
Health Care Financing Administration
At the symposium, Andrew Kirsch noted that looking at the physical context of some statistical problems sometimes exposes the fact that certain additive functional forms can be contrary to known physical relationships. In his example, a product was instead the natural construct. Wilson (1952) discusses some examples such as symmetry (§11.5, pp. 308–312), limiting cases (p. 308), and dimensional analysis (§11.12, pp. 322–328). I would note that such considerations quickly lead one to consider functional relationships that are not linear in the parameters. The supporting statistical software has been of limited utility in dealing with these functional relationships. The picture becomes more complex when one introduces stochastic components to the models and complex sample designs for collecting data.
Nonlinear regression deals nicely with the problems when the stochastic element is limited to a measurement error. However, many formulations do not readily permit the use of implicitly defined functions. However, for those of us who must perform operational analyses without the luxury of developing special tools to deal with diverse problems, adequate tools have not been easily accessible.
Expert systems are designed to prevent the incorrect application of a procedure. In addition to asking whether we have correctly used a procedure according to accepted standards, we must also ask whether the methods are useful for our solving problems. One of the very strengths of statistical methods derived for designed experiments is the very weakness of these methods. The strength is to have methods that work without regard to the underlying functional relationships. We really can design methods that work in this way by controlling the design. These methods answer the preliminary questions such as whether there is a difference or not. Any subject-matter specialist quickly wants to go beyond these questions to understand the structure of a problem. Furthermore, there are many important data sources other than designed studies or experiments. Graphical techniques are so appealing partly because our analytical tools to reveal the more complex structures are so limited and underdeveloped. I hope that graphical techniques will drive analysts to consider more complex structural relationships in the analytical context.
As I said following Paul Velleman's talk, it would be nice if statisticians would think more globally. For example, in using the results of a survival analysis package for a specific analysis, it would be useful to know how the likelihood was formulated. Better yet, some agreement on the formulation would permit comparison across various models and packages. For example, a log-likelihood for a Weibel, constant hazard, or other model form could be compared if there were some agreement in how these are formulated. Some packages do not provide this comparability, even within a given procedure. Part of the reason is that some drop constants that are not relevant for the
function to be optimized, the log-likelihood. Others use the density function in the formulation, while still others consider events to occur in an interval (which must be stated) and formulate the probability of the event occurring in that interval. I prefer this latter approach because of its generality. Events need to have the same interval and censored events are part of the same formulation.
Some problems fit nicely on personal computers, while others have large volumes of data that are centrally managed. Successful interaction with these data depends on the links between the data sources and the tools for analysis. Many specialized and intriguing enhancements appear on a variety of the changing platforms. I think a diversity of procedures and implementations is useful. One implementation may provide a capability not available in another, and results can be compared from various sources. However, it is difficult to keep track of these and to make them work on evolving platforms and in evolving environments. The speakers indicated nicely that too much effort is devoted to reproducing the similar algorithms in every setting and not enough to the enrichment of our tools.
Let me point out some of the problems faced by statisticians in operating agencies. We are not in the position of spending long periods on research issues. Furthermore, we are faced with managers of computing centers who want to support only standard software. This applies to mainframe and personal computer software. The danger, in the spirit expressed by Leo Breiman, is that we find nails to apply our hammers to. We do this by reformulating our problem to fit one for which we know the solution. The result is a correct solution to the wrong problem because our tools are oriented to solving the textbook problems.
We also encounter difficulties applying many routines to larger data sets--50,000 to 10 million cases. Furthermore, the diagnostics for dealing with large data sets need to be very different from those envisioned for smaller data sets for which we can easily scan residuals, plots and other simple displays, or tables for all of our data. For example, techniques for focusing on a few outlier observations need to be generalized to focus on outlier sets. I have found that having a convenient means for constraining or fixing parameters can be used in many interesting ways to solve complex problems such as those faced when trying to study the effect of many (several thousand) indicator variables in analyzing large data sets.
The needs are different for exploratory work where we are trying to understand relationships, and for production activities such as making 6000 tables or graphs for a publication.
The extensive use of contracting-out for solution of problems fragments the system and tends to work against having the technical expertise and resources appropriately concentrated to recognize and develop appropriate solutions. (See the various presentations by W. Edwards Deming on the system and profound knowledge, e.g., his February 16, 1991, talk at the annual meeting of the AAAS in Washington, D.C.)
A few years ago I found the available options were inadequate to the task of probability analysis. Can you image a procedure that produces the same graph no matter what the problem? Actually the graphs were correct but not useful since the scale on the
graph changed with the problem. This does not facilitate graphical comparisons. The problem at hand required an elaborate work-around with the available package (see Bailey and Eynon, 1988).
Such creative use of imperfect tools is always required. I remember learning that a major government agency computer center (EPA) was going to abandon support for SPSS on the mainframe. This was a management decision without contact with the statistical community in the agency. Part of the problem arose because users could not readily obtain manuals. At another agency, SAS is the principal mainframe package available. While this is a powerful package suitable for many important statistical activities, other packages provide strengths not found in SAS.
Bailey, R. Clifton, and Barrett P. Eynon, 1988, Toxicity testing of drilling fluids: Assessing laboratory performance and variability, Chemical and Biological Characterization of Sludges, Sediments, Dredge Spoils, and Drilling Muds, ASTM STP 976, J.J. Lictenberg, J. A. Winter, C. I. Weber, and L. Fradkin, eds., American Society for Testing and Materials, Philadelphia, pp. 334–374.
Wilson, E. Bright, Jr., 1952, An Introduction to Scientific Research, Dover, New York.
Michael P. Cohen.
National Center for Education Statistics
I have a few comments on suggested standards and guidelines. These comments reflect my opinion only. My experience has been in sampling, design, and estimation for large, complex statistical surveys, including the U.S. Consumer Price Index and the Integrated Postsecondary Education Data System.
While the forum emphasized the proliferation of software, there remains a dearth of statistical procedures for analyses of data from complex surveys within general-purpose packages. I am aware of special -purpose packages such as SESUDAN, WESVAR, and PC-CARP.
Many statistical packages still do not treat weighted data very well. Even procedures that allow weights often treat them as a way of indicating multiple occurrences of the same observed value.
James R. Knaub, Jr.
Energy Information Administration Department of Energy
In a letter to The American Statistician [Knaub, 1987], I made some comments on the practical interpretation of hypothesis tests which are pertinent to statistical software. I feel that such software, by ignoring type II error analyses for simple alternatives, has helped make hypothesis testing disreputable, when it should be one of our viable tools. I hope that future generations of statistical software will take into account the need for type II error analysis.
Knaub, James R., Jr., 1987, Practical interpretation of hypothesis tests, The American Statistician, Vol. 41, No. 3, p. 246.
Statistical Innovations, Inc.
Overall, I believe that the panel members and invited speakers at the symposium represent a much too narrow range of interests and opinions to adequately represent the majority of users of statistical software. In general, I believe that the interests of business users, less sophisticated users, and less frequent users are underrepresented.
Among more sophisticated users, the emphasis on Exploratory Data Analysis (EDA) of quantitative data, and applications from the health sciences and quality control were overrepresented. In particular, I believe that the interests of survey researchers from the social sciences who analyze primarily categorical data are being largely neglected by the panel.
Most of the presentations point out the importance of EDA methods in appropriately dealing with violations of the assumptions of traditional techniques for quantitative analysis. However, there is a major revolution taking place in the analysis of categorical data that was totally neglected at the symposium. The revolution in categorical analysis from simple cross-tabulation software to log-linear models, latent class analysis, association models, and other chi-squared-based techniques is even more remarkable than EDA techniques since it is based on a unified theory. There is no need to adjust for the unrealistic assumptions of normality and linearity, since such assumptions are not made.
As interest in categorical methods continues to grow, in a few years these methods may well represent the majority of statistical applications. The choice of regression and one-way ANOVA as examples at the seminar illustrates the overemphasis on EDA and quantitative analysis. Regression and ANOVA are the same type of technique. One-way ANOVA was chosen simply to illustrate box plots. Cross-tabulation was totally ignored.
While my own academic background is econometrics, and while I have taught statistics at Tufts and at Boston University, I am currently president and chief statistician of a consulting firm where businesses are my primary clients. I am also the developer of a statistical package that is selling at the rate of more than 50 copies a month and has over 500 users--primarily business users. Thus, I am very familiar with business use of statistics.
I have also been the software editor of the Journal of Marketing Research, and head of the software committee at ABT Associates prior to forming Statistical Innovations Inc. in 1981. I have published on both quantitative and categorical multivariate techniques. I have also organized and conducted statistical workshops with Professor Leo A. Goodman, and through these courses have trained several hundred practicing researchers. Thus, I speak from my experience when I strongly recommend that these under-representations present on your panel be corrected.