scientific problem is at issue, elegant solutions may not be available in some cases due to a technicality; very different-looking computer output might attend problems that appear superficially similar to a casual user. To facilitate focusing on the task instead of the technique, one might have to sacrifice some theoretical rigor that an alternate technique might have. Also, there is the question of how to choose between the more-and less-rigorous versions. An explicit threshold or criteria must be available.

In the context of one-way ANOVA, when the robust estimation is recommended, what might be done? The box plot has been seen as a generic graph describing the one-way ANOVA problem. In this robust version, the box plot of course represents the median as one of the more prominent points for each group. So it would seem that the median would be the natural thing to take as the alternative robust estimate, in order to have a tie-in with the graph that was being used to drive the analysis. The problem with using the median in a general linear model framework is that the median does not have an easily obtained sampling distribution. A more approximate approach is forced. One such approach is to have several samples rather than just one sample, and to use as a general measure of dispersion a multiple of the interquartile range of the residuals, after subtracting off each group median.

If the object is to provide an ease of use that allows the software user to focus on the task rather than the technique, some new techniques may have to be invented in conjunction with this. The users never see any of this; they just see a system suggestion that the confidence intervals for contrast be based on medians. They can override that suggestion if they want. Then they merely get confidence intervals for contrasts and predictions without needing to go into an entirely different software framework.

In summary, for the one-way ANOVA layout, in-software guidance is possible. There are, however, more complicated scenarios that could be called one-way ANOVA that are not covered by my remarks, e.g., issues about sampling methods and the validity of inference to target populations.

There is no doubt that whenever a piece of software provides some kind of guidance, it will offend a certain fraction of the statistics community. This is because whenever you give a problem to several statisticians, they each will come back with different answers. Statistics seems to be an art, and very hard to standardize.

More than anything else, the means to providing better guidance is to make the entire data analysis process more transparent. In the definition of one-way ANOVA, one should be looking at a box plot and determining if there is more that can be used there. If a person can interactively point and click, and so really get hold of a box plot, presumably more can be done with it. Perhaps a manipulation metaphor can make the goals of statistics more transparent and concrete, as well as the uncertainty measures produced by a statistical program. This is going to produce more for the guidance and overall understanding than boilerplate dialogues that attempt to mimic the discussion that an expert statistician might have with a client. Still, in order to produce that transparency, there will always have to be practical compromises with elegant theory before the ever-increasing numbers of data analysts can have ready access to the benefits of statistical methods.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement