The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
The Future of Statistical Software: Proceedings of a Forum
big ones that everyone knows. There are 300 of them available out there. A lot of bad software is being foisted on unsuspecting customers, and I am deeply concerned about that.
To put it in concrete terms, what does it take for the developer of a statistical package to be able to claim that the package does linear least-squares regression? What minimal functionality has to be delivered for that? There exists software that does not meet any acceptability criteria--no matter how little you ask.
Another matter troubling me, even more so now that I am a confirmed UNIX user, is the inconsistent interfaces in our statistical software systems. Vendors undoubtably view this as a feature, it being what distinguishes package A from package B. But as a user, I cannot think of anything I hate more than having to read some detestable manual to find out what confounded things I have to communicate to this package in order to do what I want, since the package is not the one I usually use. This interface inconsistency is not only in the human side of things, where it is readily obvious, but also on the computer side. There is now an activity within the American Statistical Association to develop a way to move data systematically from package to package.
Users of statistical software cannot force vendors to do anything, but gentle pleas can be made to make life a little easier. That is what this activity is really about, simplifying my life as well as yours and those of many others out there.
There was discussion this morning on standardizing algorithms or test data sets as ways to confirm the veracity of programs. That is important, but what is even more important is the tremendous amount of wasted human energy when yet another program is written to compute XTX-1XTY. There must be 400 programs out on the market to do that, and they were each written independently. That is simply stupid. Although every vendor will say its product has a new little twist that is better than the previous vendor's package, the fact is that vendors are wasting their resources doing the wrong things.
We heard several speakers this afternoon discuss the importance of simplifying the output and the analysis. If vendors would expend effort there instead of adding bells and whistles or in writing another wretched regression program, everyone would be much better off. The future report of the Panel on Guidelines for Statistical Software will include, I hope, some indications to software vendors as to where they should be focusing their efforts.
I do not know the solution to all these problems. Standards are an obvious partial solution. When a minimum standard is set everybody agrees on what it is and, when all abide by the standard, life becomes simpler. However, life does not become optimal. If optimality is desired, then simplicity must usually be relinquished. Nevertheless, standards are important, not so much in their being enforced, but in their providing targets or levels of expectation of what is to be met.
I hope this panel's future report will influence statistical research in positive ways, in addition to having positive effects on software development, and thereby affect the software infrastructure that is more and more enveloping our lives at every turn.
The critical thing that is most needed in software is adaptability. I loved Daryl Pregibon's “do.it = True” command. That is an optimal command. From now on, that