Keith E. Muller^{1}

*University of North Carolina, Chapel Hill*

The Panel on Guidelines for Statistical Software has organized its examination of statistical software around the concepts of exactness, richness, and guidance. The one-way layout presents a ubiquitous task, and hence a stimulating prototype for devising guidelines. This presentation covers richness in the one-way layout. In the section below titled “Richness Dimensions,” a number of dimensions are proposed as the basis for evaluating the richness of statistical software. Each of these is examined through the same three steps: (1) a brief definition is given, (2) richness is detailed for the one-way layout in the dimension under discussion, and (3) the specific description is followed by a discussion of general principles.

Creating general and useful principles will require continuing interactive discussion among many interested observers; the comments reported here are intended to stimulate further discussion. Such discussion will help avoid embedding statistical fads in guidelines. Specific evaluations of software unavoidably depend on the philosophy of the evaluator. Hence evaluation guidelines must be ecumenical in coverage, yet able to be narrowed for a particular task. Undoubtedly the proposed structure and topics are not definitive. Despite that, the present author does believe that any alternate approach must cover all of the issues considered here.

A number of terms must first be defined, at least loosely. **Statistical software** will be taken to be any collection of computer programs and associated information intended to directly aid the production of any sort of statistical analysis. Note that this definition includes software that itself may not produce statistical analysis. For example, a program

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
Richness for the One-Way ANOVA Layout
Keith E. Muller1
University of North Carolina, Chapel Hill
Statement of the Problem
Rationale of Approach
The Panel on Guidelines for Statistical Software has organized its examination of statistical software around the concepts of exactness, richness, and guidance. The one-way layout presents a ubiquitous task, and hence a stimulating prototype for devising guidelines. This presentation covers richness in the one-way layout. In the section below titled “Richness Dimensions,” a number of dimensions are proposed as the basis for evaluating the richness of statistical software. Each of these is examined through the same three steps: (1) a brief definition is given, (2) richness is detailed for the one-way layout in the dimension under discussion, and (3) the specific description is followed by a discussion of general principles.
Creating general and useful principles will require continuing interactive discussion among many interested observers; the comments reported here are intended to stimulate further discussion. Such discussion will help avoid embedding statistical fads in guidelines. Specific evaluations of software unavoidably depend on the philosophy of the evaluator. Hence evaluation guidelines must be ecumenical in coverage, yet able to be narrowed for a particular task. Undoubtedly the proposed structure and topics are not definitive. Despite that, the present author does believe that any alternate approach must cover all of the issues considered here.
Problem Boundaries.
A number of terms must first be defined, at least loosely. Statistical software will be taken to be any collection of computer programs and associated information intended to directly aid the production of any sort of statistical analysis. Note that this definition includes software that itself may not produce statistical analysis. For example, a program
1
The author gratefully acknowledges the stimulation of summaries of meetings of the Panel on Guidelines for Statistical Software. The basic approach in which this discussion is embedded was determined by those deliberations. In addition, some specific examples are taken from the summaries.

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
used to create a file used as input to an analysis program may fall under this definition. Exactness consists of a choice of an acceptable algorithm and the accurate implementation of the algorithm. For example, asymptotic formulae for variance estimation or p-value calculations should not be used in small samples whenever more precise calculations are available. The algorithm should either tolerate stressful data configurations, or detect them and gracefully report an inability to proceed. Guidance consists of the help provided by the structure and features of the software in conducting a correct and effective analysis. This includes assistance in choosing appropriate instructions, as well as assistance in deciding whether a particular approach is valid. Richness consists of how fully the software can do the analysis. The term coverage may be preferred by some. Throughout, the user will be taken to refer to the person executing the software.
The definitions of exactness, guidance, and richness all overlap somewhat. Distinctions between guidance and richness are particularly important in defining the boundaries of the present discussion. At one extreme, providing maximal guidance leads to an expert-system approach. In that case, richness ceases to exist as a separate property, being determined by the range of tolerated inputs and guidance accuracy. At the other extreme, providing minimal guidance leads to documenting features of the software and nothing else. An intermediate amount of guidance would be the automatic inclusion of diagnostics concerning the assumptions of the method of analysis. In contrast, richness describes the availability and convenience of creating such diagnostics. In considering guidance, the validity of the analysis approach for the data at hand is always in question. In contrast, when discussing richness it will be assumed throughout that using the software for the data at hand is a valid endeavor.
The one-way layout in analysis of variance (ANOVA) is used as the basic example. For the sake of brevity, familiarity with traditional approaches will be assumed. Kirk [1982] provided a comprehensive treatment of a large range of ANOVA models. In one-way ANOVA, a continuous response variable is examined to assess whether it is related to a categorical predictor variable. The predictor values may be strictly nominal, ordered categories, or interval scale values. Typically two to ten distinct predictor values are present. A number of regularity conditions must be assumed for both non-parametric as well as parametric methods in order to ensure the validity of statistical analysis. These may be loosely grouped into assumptions concerning (1) existence, (2) independence, (3) model, and (4) distribution. Traditional parametric fixed-effect ANOVA requires independent and identically distributed (i.i.d.) Gaussian scores within each category and categories that differ only by expected value.
Richness Dimensions
1. Epistemological Goals
Definition The epistemological goal(s) for an analysis consists of the standards by which

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
truth is judged, the standards by which decisions are made, and the purpose of the analysis.
One-Way Examples Example 1: the user wishes to evaluate whether red, green, or blue text can be read most rapidly on a computer screen. Example 2: the user wishes to estimate the location and scale of the amount of hypoxic cells in biopsies taken from a number of different types of cancerous tumors. Example 3: the user wishes to discover whether a new drug regimen maintains kidney function better than current practice. Example 4: the user wishes to decide whether a single time-release capsule has the same therapeutic effect as three smaller doses delivered once every eight hours. Example 5: the user wishes to examine the effect on hypertension of the dietary presence of one foodstuff from a large list.
Comments Epistemology is the study of the basis and limits of knowledge. Every user approaches an analysis task with a particular philosophy, whether explicit or not (even to the user). One's philosophy determines how to decide what is true, and what is worth deciding. Statisticians' philosophies vary substantially on a number of dimensions. For example, one may prefer a Frequentist, Bayesian, or Decision-Theoretic approach. More generally, statisticians describe estimation and inference as separate activities. Furthermore, one may be adamant about distinguishing between confirmatory and exploratory analysis, or actively opposed to the distinction. One may favor a parametric, robust, or non-parametric strategy.
General Principles Users have widely varying philosophies and purposes in using software. Software authors and reviewers should report the epistemology upon which they based their work. It should be emphasized that this does not demand statistical ecumenism. Instead such clarification will allow users of software and readers of software reviews to recognize whether the bases for evaluation are shared.
2. Methods
Definition Methods consist of all the techniques and algorithms that can be implemented within particular software.
One-Way Examples First consider the traditional parametric model assuming Gaussian errors, from a Frequentist perspective. Applications of estimation methods include estimation of primary parameters, such as cell means in a cell mean coding, and estimation of linear combinations of primary parameters (which are secondary parameters, and contrasts), such as mean differences or trend contrasts. Testing methods include the general linear hypothesis test and the many kinds of multiple comparison methods available [Miller, 1981]. Note that estimation of parameter variances and the specification of confidence intervals allow an (embedded) alternate approach for scalar parameters.

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
A Bayesian perspective requires implementing different methods for some of the same tasks, but also methods for tasks not listed here (such as posterior density estimation).
One may move away from the traditional model in many directions. Concerns about the robustness of the traditional model are implicit in all such moves. Examples include methods for evaluating the validity of assumptions [regression diagnostics; see Belsley et al., 1980], semi-parametric modifications such as down-weighting extreme values, and rank-transformation [Puri and Sen, 1985].
Comments The effectiveness of the implementation depends on how conveniently the interface links the statistical software with other software and the user. Many of the methods can be implemented with graphical displays. Diagnostics and multiple comparison methods are especially appropriate. Interfaces in general are discussed in the sections below titled “Inputs” and “Outputs.”
Richness of methods unavoidably intertwines with guidance and exactness. Consider the following applications, which were not given as examples: weighted least squares, both exact and approximate; estimating nonlinear functions of parameters and associated confidence intervals; random effects model estimation and testing, all subjects tested in all conditions (repeated measures); power calculation; and the analysis of dispersion. Should these applications be covered by one-way layout software, nominally centered on ANOVA? This issue will be addressed in the discussion of structure below.
General Principles Methods should be judged on (1) exactness, (2) breadth, and (3) interfaces. Exactness includes numerical accuracy, numerical efficiency, and optimality of technique (for example, never using asymptotic approximations in small samples when exact calculations are practical). Breadth can be evaluated only after having specified the target coverage desired. User interfaces should include communication from the user to the software (control) and feedback to the user about, for instance, any branches taken by the software.
3. Inputs
Definition Inputs are the statistical assumptions, data files, and control files. “Files” include signals from user interfaces such as keys, light-pens, joysticks, and mouse keys, as well as information organized on computer-readable media.
One-Way Examples The assumptions required for least squares optimality of the traditional fixed-effect model may be summarized as homoscedasticity of variance, existence of finite second moments, independence of observations, and model linearity (trivially met in this case). Assuming Gaussian errors leads to maximum likelihood optimality of the least squares calculations, and allows closed-form calculation of optimal likelihood ratio tests. One may wish to choose the parameter structure of the model, such as cell mean. One may also wish to choose particular contrasts to estimate, such as trends. Data are

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
often stored in text format and also in formats created by various proprietary data management or data analysis software packages.
Comments Assumptions must be treated somewhere, and so they are included here, although they may merit separate treatment. Attention to satisfying assumptions contributes strongly to good data analysis.
The evaluation of software can be dominated by the quality of the input interface with the user. For example, software that depends on field-dependent references to variables may trip up even a sophisticated user. Computer designers and scientists have focused on the control files. Many convenient data entry packages are available, although unknown to most users. Software structure has also been recognized as an important determiner of input adequacy. Relatively little effort has been expended on data file importing. This is often difficult even across software, within a platform. Crossing platforms and software can be extremely difficult.
The great majority of data analysis methods are only defined for rectangular arrays of observations, allowing perhaps some missing data or other irregularities. Notable exceptions center on the work in classification and taxonomy. Database management software necessarily supports a great variety of data arrangements and relationship patterns. The conversion process may be inconvenient. Furthermore, care must be taken to produce an analysis file that allows the questions of interest to be addressed. These same comments are relevant also to the discussion of structure below, and to the discussion of guidance.
General Principles Considerations about the validity of assumptions should be embedded in input requirements of any statistical software. Accepting a broad range of inputs may substantially enhance the utility of the software. Users should be able to control, if they wish, the choice of algorithms, such as coding schemes. These desires must be balanced with the critical needs of efficiency and simplicity of use. Input correctness should be checked extensively. Error messages should be self-contained and indicate possible corrections. Clever structuring may prevent many errors from occurring.
4. Outputs
Definition Outputs are the collection of sensory displays provided immediately to the user, as well as any non-transitory record stored on paper, or on digital or other media.
One-Way Examples For the traditional Gaussian errors approach, example analysis outputs include an ANOVA source table, parameter estimates and associated variance estimates, and plots of means. Example diagnostic outputs include tests of homogeneity or normality, and box-plots for each cell. All can be produced on a character printer or a graphics device, or stored on digital media.

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
Comments Most current presentations of statistics could be enhanced by the addition of carefully chosen graphics displays. Currently, the depth of one's employer's pockets substantially determines the ability to render graphics with adequate resolution. Software tends to be very platform-specific, with only the beginnings of graphics interchange standards. For static displays, some optimism may be appropriate in that systems of modest cost are rapidly approaching the limits of the human visual system. Current dynamic displays are hardware-and cost-limited.
A user may wish to validate a particular invocation of software, conduct an analysis not provided as an option, document properly, “check-point” a procedure, or interface with other software. All require the ability to direct any output to digital media. Conscientious data analysis, coupled with the power of computers, invites such approaches. Digital file output enables extensibility, both within and between packages. Many different platforms and types of software are used for data analysis. Such diversity provides as many disadvantages as advantages.
General Principles The user may wish to direct any calculated result, including graphics and other displays, to a storage file for use at another time, on another computer, or in a different place. Such files should be easily portable and self-documenting. Users will strongly prefer software that supports standard interchange formats: science operates on the basis of sharing information. Software vendors, however, depend on secrecy of information to allow them to stay in business and make a profit. Vendors and scientists will need to cooperate and be creative to meet the user preferences for openness and ease of interchange while simultaneously protecting legitimate business interests.
5. Options
Definition Options are the alternative epistemological goals, methods, inputs, outputs, structures, internal paths, external paths, and documentation that may be invoked in addition to or in lieu of the default choices.
One-Way Examples Options that might be desired in the traditional one-way ANOVA include choice of coding scheme, deletion of the intercept from the model, creation of confidence intervals for mean differences, specification of the categorical variable as a random effect, and diagnostics for homogeneity of distribution across groups.
Comments The goals, structure, and audience of software mostly determine what options can and should be available. Given a particular goal and audience, the structure of options will reward or frustrate, depending upon the skills of the author. The availability of many options empowers the user who is able to control them efficiently. However, the same breadth may overwhelm and mislead less knowledgeable users.
General Principles The choice of options should be based on software goals. The choice

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
of defaults should be based on the audience. Layering options may be used to resolve the conflict between the needs of the novice and the sophisticate. Certain desirable options are discussed in other sections of this paper, including “Inputs” and “Outputs” above.
6. Structure
Definition The structure of statistical software consists of the module definition and branching schemes employed in design and execution.
One-Way Examples Currently, most ANOVA software reports source tables by default and multiple comparisons at the user's specific request. For software with diagnostics available, few packages report them by default.
Comments The structure of a program embodies the designer's philosophy about the analysis goals and guidance appropriate for the expected audiences. Diversity of audiences appears to require layers of options. Limited structures constrain richness, while complex structures reduce efficiency and user difficulties. Designers must understand the conceptual, analytical, and numerical tasks in any statistical method in order to produce appropriate and efficient structure.
The description of the example being considered as “the one-way layout” may be contrasted with the description “one-way fixed-effect ANOVA.” The former describes the format of a collection of data values, while the latter fully specifies a model and associated analysis, including the required data format. The two descriptions correspond to radically different structures and labels for the elements of the structure. In turn, documentation and the audiences that can be served are strongly affected.
General Principles Ideal statistical software would provide seamless modularity. Such software would always present an efficient and simple-to-use interface with the user and other software. The methods and use of the product would derive from a consistent set of concepts, not a collection of tricks and gyrations. Good structure incorporates principles of perception and learning from the behavioral sciences and principles of numerical analysis and program design from the computational sciences.
7. Internal Paths.
Definition The internal paths of statistical software are the branches that may be followed due either to the dictates of the algorithm and the data or to choices made by the user.
One-Way Examples In a one-way ANOVA, one may wish to evaluate diagnostics on residuals before choosing to examine any output involved with inference. Much software always produces a source table. Software may allow the user to specify conditional

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
execution of step-down tests, such as trend tests. Depending upon the coding scheme, a program may use an orthogonalization scheme or a simple elimination algorithm.
Comments Sophisticated users may desire control of branches internal to a single module. Such control may allow the naive user to bungle an analysis. Such control may allow the sophisticated user to tune the performance of the program to the application.
General Principles Sophisticated users desire control of all internal paths. Access to such control must be guarded with appropriate warning information. This recommended approach should be evaluated in light of the guidance standards and audiences.
8. External Paths
Definition The external paths of statistical software are the branches that may be followed by the user and data into, out of, and back into the software.
One-Way Examples The user may wish to conduct diagnostic analysis on alternate transformations of the data. The results may then be input to a summary analysis. In turn the user then needs to implement the preferred analysis.
Comments One rarely uses software in isolation. Convenient and efficient paths into and out of the software greatly facilitate quality data analysis.
Statisticians will continue to create better methods that are computationally intensive for whatever computing machinery becomes available. The ability to check-point such calculations would be advantageous. For example, some current iterative programs require manual intervention even to avoid most of the iterative calculations on a subsequent invocation.
General Principles Interfaces (both input and output) with other modules in the software should be provided. Convenient abilities to temporarily suspend execution, check-point, and conduct analysis recursively may substantially enhance the utility and convenience of the software.
9. Documentation
Definition Documentation consists of all information intended to aid the use of statistical software.
One-Way Examples Traditionally paperback books designed to be reference manuals have been the primary documentation available to the user. Such manuals focus on describing the vocabulary and grammar of the language needed to control such things as

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
the choice of response variable, the choice of the categorical predictor, labeling, and analysis options.
Comments One plausible standard for perfection of software would be the need for no documentation. Extensive documentation may reflect either richness (and guidance) or awkwardness and unnecessary complexity.
Many types of documentation may be provided. Software-focused information usually resides in manuals for language reference, system management, and description of algorithms. Tutorials, collections of examples, and statistics texts focused on a particular piece of software assist the training of users in both the software and the statistical methods. User groups and toll-free telephone numbers may be supported, reflecting either the vendor's sensitivity or defensiveness.
Many formats may be used for documentation. The paperback book has been challenged by on-line documentation, at least for truly interactive software. The recent successful marketing of electronic books in Japan provides yet another step toward the handling of all information digitally. Arguments over ring-bound versus spine-bound versus on-line manuals will eventually also involve new formats.
Documentation can make good software look bad and bad software look good. Effective documentation requires the same attention to structure as do the algorithms. The top-most layer of documentation may be thought of as metadocumentation, documentation of documentation. Proper layering and branching can help the user.
Surprisingly, many existing manuals do not provide examples of actual code in all cases. In describing a particular programming statement, or a sequence of clicks or keys, a template description may not suffice. An appendix of formulas and a list of algorithmic steps may greatly aid understanding and using software. Professional standards may demand verifying the acceptability of the techniques relative to the data at hand.
General Principles Software may be documented in many formats. Metadocumentation can help the user take full advantage of documentation. Documentation should be structured, based on the same principles as the software. A reference manual, although always necessary for the sophisticated user, does not, by itself, provide adequate documentation. Algorithms and formulas should be detailed. Truly complete examples should be included. The extent of tutorials and statistical information embedded in documentation depends on the guidance goals and the audiences. Sophisticated users may dislike the presence of statistical information and advice in documentation, while novice users often crave it. System implementation documentation should be available.
10. Audiences
Definition The audiences for statistical software are the user groups that may be distinguished from each other because of their different approaches to and uses of the software.

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
One-Way Examples A piece of software may be used by a person with a doctorate in statistics and by a person with literally no statistical training. The same software may be used by a person with a master's degree in computer science and by a high school student frightened by computers.
Comments For statistical software, user sophistication varies in (1) knowledge of statistical theory, (2) knowledge of computing theory, (3) proficiency with data analysis, (4) facility with computing, and (5) experience with research data management. Natural language sophistication and physiological limitations, such as color blindness or response speed, may be relevant in some applications.
General Principles Designers, programmers, documenters, and reviewers of statistical software need to be explicitly and continuously sensitive to the audiences of interest. Software and documentation structure should reflect the often disparate needs of the novice and the sophisticate.
What Next?
Step Back
The basic position presented above is that richness varies on a large number of continuous, correlated dimensions. It is also argued that the creation and evaluation of statistical software should occur with respect to explicit target goals and target audiences. This suggests that careful specification of the task, based on exactness, guidance, and richness, should always be the first step.
Jump In (Continuous Involvement)
The process of creating and evaluating software improves with effective interaction between producers and consumers. Such continuous involvement will lead to the creation of good products. However, the products will not be completely successful unless the decision makers and the “fashion” leaders of the user community can be educated about general and specific guidelines for good statistical software. Therefore, even after the Panel on Guidelines for Statistical Software releases its final recommendations in a future report, it will be necessary for those of us interested in guidelines to stay involved to make the user community aware of the panel's guidelines and encourage their acceptance.

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
References
Belsley, D.A., E. Kuh, and R.E. Welsch, 1980, Regression Diagnostics, John Wiley & Sons, New York.
Kirk, R.E., 1982, Experimental Design, Brooks/Cole, Belmont, Calif.
Miller, R.G., 1981, Simultaneous Statistical Inference, Springer-Verlag, New York.
Puri, M.L., and P.K. Sen, 1985, Nonparametric Methods in General Linear Models, John Wiley & Sons, New York.

OCR for page 3

The Future of Statistical Software: Proceedings of a Forum
This page in the original is blank.