philosophy of statistics, and the textbooks have not caught up with it. The kind of textbooks and the type of statistical teaching that were so prevalent in the 1950s and 1960s and perhaps even into the 1970s are no longer accepted by expert users, as exemplified by today's speakers. Unfortunately, there is a Frankenstein monster out there of hypothesis testing and p values, and so on, that is impossible to stop. Most people think that statistics is hypothesis testing. There is a statistical education issue here for which I do not have a quick solution.
So here are the principles of my philosophy of guidance. Graphics should be somehow totally integrated, and one should not ever think of doing a data analysis without a graph. The focus should be on the task rather than the technique, emphasizing the commonality of different analysis problems. By keying on the commonality, what is learned in one scenario will help in another one. Different sample sizes, designs, and distributions must be smoothly supported. Merely having equal or unequal numbers in each group should not require that the user suddenly go to a different chapter of the user's manual. An occasional or infrequent user who doesn't understand why that should be necessary will be totally alienated. As mentioned before, hypothesis testing should be de-emphasized in favor of point and interval estimation. Simple, easy-to-visualize techniques should be chosen. Lastly, the statistical software should help as much as possible with the interpretation of the results and with the assembling of the report.
With these guidance ideas in mind, one of the first things to note is that “one-way ANOVA” is, of course, jargon. What does that really mean? How are people to know that they should use a program called one-way ANOVA when they need to compare different groups?
It is easy to give guidance if the scientific questions can be rephrased in terms of simple variables. A statistical variable is not a natural thing. In statistics training, a random variable is drummed into the student earlier. It is better to phrase all of the statistical scientific questions in terms of questions about relations between variables. Instead of comparing the rats on this diet and the rats on that diet, one wants to know if there is a relationship in rats between weight gain and diet, with weight gain being one variable and diet another.
That is not a natural use of the language for most people. Yet software is much better used if the user has to think about a database of random variables and relationships between those variables. Forcing users to do that is, in a sense, a disadvantage of software, but also an ultimate advantage for users; if people understand and think about random variables, it will greatly help them to think about statistical issues in the right manner. Having users think in terms of variables may be doing them a favor.