Read "Survey Automation: Report and Workshop Proceedings" at NAP.edu

Page 153 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

PARTICIPANT: So you use them in a different place, the modelers from the testers?

ROBINSON: No, we actually use them all over the place; where we don’t use them is at the very end of the line where, you know, we basically get young people and give them caffeine because they have to be up all night changing hardware configurations or something.

CORK: One last question and then we actually have to move on …

PARTICIPANT: This is basically a “checking to see that I’ve got it” type of question. Does model-based testing in practice consist of, for all states or for as many states as time or resources allow for, enumerating the events that can take place, enumerating the different states, and then giving that over to something like Visual Test and saying, “go for it”?

ROBINSON: Yes, and in fact I would qualify the first statement to, you model as much as you want to get testing from. So if there’s an area of your application which you are sure is solid, don’t worry about it. Model something where it will do you good. So, for instance, if my car keeps breaking down then I need to learn something beyond where I’m at now. But as things are now my model is sufficient.

OK, thanks.

QUALITY RIGHT FROM THE START: THE METHODOLOGY OF BUILDING TESTING INTO THE PRODUCT

Robert Smith

CORK: The next speaker that we have is Robert Smith, who’s currently a visiting scholar with the Hoover Institute at Stanford. He’s going to describe experiences he’s had doing these sorts of testing routines at the Computer Curriculum Corporation.

SMITH: One of the, I suppose, a couple of disadvantages of speaking this late on the program is that you’re likely to find out that someone has said what you had wanted to say. Perhaps better than you would have, or the conversation has gone a different direction. You also may have to modify your presentation to remove some of the more fragrant errors that have been exposed in a previous conversation. [laughter] So I hope that I have been able to re-thread through that just a bit.

I wanted to pick up on a couple of points. I think that incremental development, small teams, transparency, documenting first, quality throughout are really good things. Avoiding the “design chicken”—good metaphor, Jess, I used to call that “ball’s in your court.” I also like, however, coding first; that can be called “hacking,” it can also be called “fast prototyping” if you want a nice word for it. I think that you learn a lot sometimes by, at the beginning, getting something up on the screen and

Page 154 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

letting users interact with it. And maybe that’s just part of the specification process.

I like unit testing very much; I think that it helps speed up the process and brings everybody into play. I like having people who are more or less experts on certain parts of the program, although ownership—you don’t want to get to the point where someone is saying, “It’s mine,” but somebody who’s a specialist. Use cases, something I think Larry is going to talk about later on, a very, very important part of specification; you learn a lot when you sit down and talk to, say, your field representatives or your data analysts and so on and learn what they’re going to want out of the system.

Design restraint—and I got that phrase, Pat, I think from you—is a real issue, and it cuts a lot of different ways. You can find yourself reinventing a lot of wheels unnecessarily because people come in—especially new people—and say, “I want it this way, I don’t want to use exactly what you have,” the existing framework or system. You can also find yourself stifling innovation. And I’ve often found myself where new people will come in and ask too little because they don’t know the richness of what you already have. They take a kind of conservative approach, they don’t ask for very much, and so you don’t get the kind of product they might’ve. But this is a tradeoff, and I don’t have an exact answer to it, except probably to try to document and explain as well as you can and get people to buy into that as much as you can, that would seem to be the best approach.

Lots of configurability within systems to allow them to change, to have things within them that are programmable. Which I gather in CASES would generally be described by putting in more features, by putting in more tags. And of course quality throughout, which is really what I’m talking about today.

What I want to do is to give a case study of work done at Computer Curriculum Corporation, to talk about particular testing challenges and solutions there, and to see if I can make the relevant CAI-to-interactive-surveys connection. And since writing this slide I’ve realized that CAI has two meanings here—there’s “computer assisted instruction” which is where I’m coming from and there’s “computer aided … interviewing.” I almost said “interrogation;” that would have been the wrong word. [laughter] So we want to distinguish those two things; if I say “CAI” by accident I probably mean the education sense.

Computer Curriculum Corporation is now called, by the way, NCS Learning; it’s owned by Pearsons and it’s entirely different from what it was. But it was the market leader, and I think still is, in computer-assisted instruction for K-12. Founded by Pat Suppes and Richard Atkinson; I’m mentioning Pat because a number of you know Pat. Pat is actu-

Page 155 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

ally represented here today—Ray [Ravaglia] is here as his representative from Stanford—and is still going strong. We had large comprehensive interactive courses that were intended to be used over eight, nine years, twenty minutes a day. In 1998 about four million students at 16,000 schools were using the product.

There are some testing challenges in this particular kind of course. And actually everyone wants to say that your courses are very complex. But if you look today at a lot of educational content, instructional content on the Web, and I’ve been looking at a lot of those things, doing some consulting for some companies—lots of things are very linear. You know, present the material, go right straight through, no change. CCC was very different because we wanted to optimize the instruction and, to do so, we were doing a lot of decision-making to decide what should be given to a particular student. Courses would contain, at the very small level, hundreds of exercise generators that would generate a large number of different exercises, and an example of that would be, “give me an exercise that has column addition and two carrots.” Two clicks is a carrot. OK, so you want that kind of exercise randomly picked but still subject to that constraint through the curriculum. It’s absolutely amazing how many errors can creep into a curriculum with boundary conditions like that. You wouldn’t think it would be, but that presents—at a very micro level, that creates some problems, and those are probably amenable to some of the techniques Mr. McCabe suggested because they tend to use discrete little pieces of code.

But it gets worse. At a level up, we have a lot of branching things that look like the kind of thing that you would have in a survey, where you have a choice point and there are three possibilities and you could branch three different ways. We have a lot of that and it dovetails together really quickly. Then another level up from that we have something that we call “motion,” which works very globally to try to match the level of the student to the level of the curriculum. And that probably sort of introduces an arbitrary GOTO statement if you want to think of this as a graph. Presenting, of course, some theoretical problems. Now, what we were trying to do was to further some optimization principles, and let me just give you a couple of examples.

One of the principles would be that if a student is having problems in a certain area but has mastered something else, let’s not keep banging him over the head with the things he’s already mastered. You’ve done fifteen of those perfect, you’ve done them perfect, we stop giving them to you. On the other hand, you’re having problems with fractions; you can’t multiply them. So increase the material—add to the tutorials—on the things you’re having trouble with. This is one optimization principle we used.

Page 156 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

Another optimization principle was in the area of mastery. Mastery is one of these educational ideas, and it’s often defined like saying if you get 80 percent of something right then you’ve mastered it. [Some] use 100 percent, give or take a kid there; that’s really strict mastery. But, in fact, if you think about it a little more sophisticatedly, you could, say, think of a sequence of ten exercises of a particular type. And you get the first five exercises wrong and the second five right. You attribute to that what? You attribute that you learned. And the probability of that being random is very small. So you imagine that you’ve learned now, so even though it’s only 50 percent, you attribute mastery based on what you saw. So you can take these sequences and come up with that, and that’s another of the kind of principles involved in the overall program, both at the micro level and the macro level.

And of course we had a lot of data collection and reporting facilities. I will tell you here that CCC would say to people that we are looking at four million students today and analyzing all of this data and using it to improve the courses. That was never true, unfortunately; we probably looked at less than one-half of one percent of the data. We tried to do it in a representative way. Of course, the Internet today would change that, I think. But it was logistically just too difficult.

Well, now, we did a lot of things to test this. We did a lot of code reviews, and we did lots of design reviews of a particular part. We did white box testing with rooms full of people, we did black box testing, and so on. But we also did a lot of automation, and this is just a simple kind of a characterization of the improvements that we made in it over time. A lot of it was ad hoc, not very systematic. First of all, we’d run by hand; that wasn’t very satisfactory. Then we would go out and say, “let’s just set this up to run, pretty blindly, all night, all weekend.” And what we’re really testing for there is: are there any memory leaks? Is it going to crash? Does it restart properly? All those kinds of things that are really very important but still don’t get, in any sense at all, to the individual paths that people are going to take. And then, gradually, I think, we just started adding instrumentation into the product, first at the top level—where you have this global decision about what area to give each student—and then filtering that down into levels below. And so it was added in gradually, kind of ad hoc. Eventually, we came up with something called MetaDaemon, which we sort of institutionalized, and gave a name, and people could be proud of it.

Now the answer to a question that was asked earlier—Pat, I think it was a good one—you said, would we be writing and authoring and analyzing twice, or once? I think that was the question. And the answer is once. But we put the testing conditions in at author time. And we’re Markovians, OK, so we believe that we can say, if you had a choice

Page 157 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

point and there are three choices. Well, we—lots of things would look like this. And we say that it’s 80 percent here, 15 percent here, 5 percent here. And we just put weights there and code them in. Now, we could either do that at the macro level or, kind of like object-oriented programming, overwrite it at the micro level, so it could be done in a couple of different ways. We would demarcate the choice points and the likelihood parameters were stated. And, by the way, we also put in asserts—“oracles,” I think you call them—but we call them asserts. If we come into a choice point, into a node, where you—for example, like before, where the choice is whether age is going to be 17 or greater—we put an assert right at the top of that. And of course you could argue that you don’t need to, that the marks going into that have already tested that, but we find a lot of problems that way.

So we’re actually peppering both the system itself and the actual content written in the system with the actual test conditions at the time it’s written. Now, we’re also Bayesians, in the sense that we’re willing to alter those probabilities, if we really don’t know that it’s 85-15-5. So occasionally we would get in data from the field—we would get, as I said, a small percentage back—and could make changes to things. And one of the main ways we used to modify the course itself was if we saw some precipitous drop in correct answers. So you send the course out into the field, students charging along and getting 80 percent—and all of a sudden, at a certain point, the probability of success is 50 percent. Maybe something’s wrong here; perhaps you jumped too quickly, you haven’t introduced the material properly, you have misunderstood the level of difficulty of this material relative to the students. So that was how we would make change at that level. And we would actually put some of this—not a lot, but some—back into the testing conditions. But, certainly, in your environment, with the data richness that you’re going to have, that could be done automatically.

We kept logs and scripts. Now I’m going to say I agree with Mr. Robinson; I’ve never had a whit of luck with these little automated test tools that, you know, save your screen, save your mouse clicks and your keystrokes and then run them back. It might be OK for regression testing, but every time we’ve tried to use them you find that they break right away, and usually it’s because something like you added a menu item and now the mouse only moves down 3 inches when it should have been 4, and whatever doesn’t work. So that’s a problem. But the kind of scripts we had were not at that level; they were at this MetaDaemon level of test conditions, so they were a little more impervious to GUI changes. We also did a lot to support regression testing, and I cannot express to you my appreciation of that.

Let me just show a sample of this course; can you see? … Global de-

Page 158 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

cision algorithms, and these are—maybe 15 or 20 modules here, say in math corresponding to things like addition, subtraction, fractions, geometry, time measurement, things like that. And the algorithm compares the curriculum—as a static model—to the student—as a dynamic one— and says: what should we do for this student now that would be the most optimal? And it’s also probabilistic and somewhat random. But we pick that and now come down to a sequence that will just have some local branching in it. Local branching among the complexity, along the lines of what I’ve been hearing about branching on marriage and sex and so on. Not, probably, as complex as on something like employment, where you’ve got a whole lot of nodes and responses. And then an individual exercise. And the MetaDaemon information was added here, at the top, this part being done by the main people designing the course.

This part being done by the individual authors of the individual modules, expressing their sort of a priori opinions. And then the exercise, this is typically written in the underlying implementation language, C++, say, but it would be handled similarly by different people. Everybody had a role in making sure that the test conditions were built into the product up front.

Now this was mostly, I think, not very planned. It was opportunistic, it was bang-for-buck, it was “what will help us now?” We weren’t really setting out to do a wonderful test product, and my guess is that if I had ever gone forward to the powers that be and said, “OK, I’m devoting two person-months, two person equivalents for the next year or two, for adding this MetaDaemon facility,” I mean, how are you going to be able to sell that? How is that going to help us in our effort to sell [the product]? But, it got done, and it got done because people believed that it would be valuable.

How did we get the content providers to buy in? And there was discussion a little bit ago about that—do you do this by legislative fiat, do you tell them, “we have to?” Well, in our case, we were kind of adding this gradually enough that people bought into it because they could see the benefits, and they could see that it was going to be used immediately. And as it started to roll a little bit, you know, if you were coming along to create a new course, this was automatically something you were going to do because we’d kind of proven the benefits in the past. But probably the first couple of phases of it were bootlegged.

I personally like bootleg software development projects, in a somewhat large company; you have to do them and get your other work done. I think the [trick is] in getting them done in the conduct of his or her other duties. A less bureaucratic way of saying, “get your other work done while you’re doing this other little thing.” Lots of times some very nice stuff comes out of bootlegging, and I think it’s a good way to do

Page 159 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

some things. And, as I say, it became standard.

And then some other things—these are just less interesting things. Of course, we had source control defect tracking, QA procedures designed for fast turnaround, smoke tests, and all that. Lots of unit testing and module testing, which I strongly promote.

We were missing some things, and I want to point those out to you. Customer support issues really did not find a way into our defect tracking database. And, actually, this happens in a lot of companies. So, the customer support organization was dealing with the customers, and they were getting, you know, all these issues. And many of them were, you know, “I didn’t plug my terminal in” kind of things. But there should have been some more automatic way to get them into the engineers’ tracking database. We had a political problem there; it ended up being solved by a committee meeting every couple of weeks, and they would negotiate which ones would be migrated, but that’s not the way to do it. And I think the QA process can never be integrated enough; you can never build it in enough.

Now, I wanted to compare, here, CAI to CAI, computer-assisted instruction to computer-assisted interviewing. And I think that there are a lot of points of comparison and also some differences. You’re looking for questions and answers, really, in the educational context, and you generally have an idea of what’s the right or wrong answer when you ask a question, as contrasted with what you’re doing—a tutorial or something of that sort. In a survey, it’s more likely to be a neutral response; there may not be a right or wrong answer. There may be an inconsistent answer, but there might not be a right or wrong one. I don’t know how much difference that really makes; it certainly makes a difference in how you interpret things. And from a QA perspective, I’m not sure.

Here’s a big difference. We were taking, of course, wrong answers and providing error analysis and tutorials. And, in some cases, back-ups. And we would back up by taking the student back into part of the course; he’s not doing well in fractions, so let’s take him out of fractions and move him back to a more elementary point for tutorials. And that back-up generally left all of the data on performance that had been done later, because it was still considered relevant for performance data even if it said it wasn’t enough, that he wasn’t performing well enough. But, in the case here, when you back up I think you have a lot more issues of what you do or do not do with that data, and I don’t understand it all, but it’s very interesting.

We were branching for optimization; we were branching to try to curtail the amount of extraneous information given to a student relative to their needs. And I think you’re doing something different: you’re branching for relevance. They’re something similar, in effect, but there’s

Page 160 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

a difference in purpose, looking for relevance; it’s not relevant to get spouse information from a single person.

Here, I speculate: our courses, at least, changed slowly. The course that CCC is shipping today—or that NCS Learning is shipping today— in math is very similar to what it was ten years ago. And your surveys may change more quickly; I don’t know. I heard you saying 40-year-old questions are in there, Pat, so maybe that’s something, too.

Now here is a really critical question or a point; you may be able to exploit this. We had a large audience of students and, mostly at the time, were shipping on CDs. And, so, when you go gold on that and want to ship it out to 16,000 schools, that’s a big job and you want it to be right. You have, as I understand it, a fewer number of FRs, and people who are going to be involved in the test administration or the survey administration. And that may be some difference, I don’t know; you might be able to try some more experimental things sometimes, but we can’t; it’s got to work.

One difference here is the standard of quality that has to be attained. OK, I wrote that when I was asleep, so I wa speculating.

What I would suggest is that you see if you could do this, and start with the CASES product—just as a methodological suggestion—and add some statements to that language that would allow you to say, at each choice point and each error/consistency point and back-up condition, what you’re expecting. And see if you can build some automated testing around that, if you haven’t already done that. And if you’ve already done that, then I’m apologizing. And I also put a quote here from a former QA manager of mine—and, Mr. Robinson, this is your desire— he said, “people know where things are going to fail.” And he would select the particular developer who came in late, and looked like he’d been run over by a truck, and stayed late—and that’s the person whose code would get tested most. And, boy, is that a winning strategy. So try that by all means. Of course, you may not have any; we’re from California.[laughter]

DOYLE: And we’re from Suitland … [laughter]

SMITH: This is just module testing. Test during development, and so on. Again, I like that approach very much. Test on module integration; acceptance testing for each build or revision, including … I like to be able to have a build at 8:00 in the morning and know by 9:00 whether that build is going to be testable. I mean, you’ve got to be able to run it through its paces enough so that, if it isn’t, you can tell the developers to redo this and get it back to us; it failed. You don’t want to spend all day waiting on that. And, of course, I’ve already mentioned regression testing. In particular, the suggestion I have here is authoring into your

Page 161 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

survey some of the conditions by which you want it to be tested, and automating that process.

And these are just a couple more issues, leading into some of Larry’s talk that’s going to be coming up shortly. We did coordinate slides, so that we would not duplicate too much. You might pursue asking yourself—if you already haven’t—how long would it take to make a new survey? What would you like that process to look like? And I’m talking here about the requirements of the process. And to put as a goal for yourself what used to be called “Internet time;” a couple of years ago, during the dot-com boom, everybody talked about “Internet time” as being very fast. And people aren’t talking that way anymore, but I think there’s some benefit in doing that. And I heard earlier some people suggesting to you that there might be some benefit in terms of your competition with other survey organizations, or quasi-competition.

I had as a suggestion to build an interactive GUI environment for development, and I’m pleased that Mr. Bethlehem had a good deal to say about that earlier. I believe that he was—I think it was his, there were a lot of talks so I might be confused. I would give you a possible suggestion there; I would say that something that was XML-based, for example, but that could output CASES as its object output so that you could use all your same deployment systems, but would show you various views including texts, graphs, flowcharts, and so on, and would impose the authoring environment—impose conditions on the authoring process. So, for example, you require that there be test conditions. You could ask it: what am I missing here? What do I need to fill in and complete? And it’s a bit easier to use in an GUI environment than, say, in a scripting language, which is what I understand CASES to be to date. But I don’t know CASES, so we might talk to Tom about that. I think that some of the work discussed here earlier on the documentation side could really be flipped around and made into that kind of a system.

And then, of course, there are some modern methodologies such as UML and extreme programming, and an upcoming talk will deal with that—Larry’s talk.⁴⁵ And I think there are a lot of good ideas here, but I do suggest that you adapt and evolve things rather than follow blindly. I’m not very ideologically based myself, so: use what works, and find that many things can be a part of success. Any questions?

BANKS: I very much like what you’re saying. As a statistician, I certainly support using the design of experiments to find new ways to develop surveys. It seems to me that the same problems arise in software

⁴⁵	UML, the “Unified Modeling Language,” is a language intended for the modeling of complex computer systems. Specifically, it is meant to provide a mechanism for documenting and visualizing object-oriented software systems.

Page 162 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

performance testing; there are strategies that they use that have been put out. [Has] anybody done any studies of relative efficiency of code? Say you’re trying to do something, like linear programming, and you have two things that do it: one uses one style of performance testing and the other uses a different style. And then after each group is finished the question is, what is the difference in the testing?

SMITH: That’s a good question. I don’t know enough of the general literature in testing to say that I have an answer to that. I think a lot of these things become a question of how built into the process it is. And so, to me, the idea of taking any form of testing methodology whatsoever and sort of plugging it in at the end is …

BANKS: Yes, but I don’t mean plugging in at the end. What I’m asking is, say Team A integrates one philosophy in their work. A second team, Team B, uses a different strategy, over Team [A], and at the end of the day the question …

SMITH: Controlling the other underlying variables is the question in that … I don’t know the answer to that. Maybe Tom does? I’ve gotten the name wrong; I’m sorry, your name is?

ROBINSON: Harry.

SMITH: It was either Tom, Dick, or Harry. [laughter]

ROBINSON: We’ve tried similar things to what you’re describing, and the problem is that it’s hard to give the same job to two different teams and justify that. What we’ve had to do is to give the job to one team, and then have them do it again in another way …

MCCABE: As another Tom, Dick, or Harry at this conference … I’m not a statistician, so tell me when to quit if you’ve seen all this, but there’s a literature about error seeping that says you can intentionally, in a piece of software, put an error in. It doesn’t really pertain to software, where we have plenty of errors [laughter], but the place I’ve seen it apply is when an organization puts the product out in, maybe, four or five sites simultaneously. While testing the product, you know where that error is; you know where some null errors are. Then you match the weights from the field sites to the known errors. And you typically have different characteristics in terms of the kind of testing, black box, white box, and whatever. And from that you can make some inference …

SMITH: I think that’s a good point, and it reminds me that when we had the new product in beta release, we were kind of doing a model test, selection, for the people to whom we sent the beta product. We would pick a school where we thought they had very good tutors, very good proctors. We would pick one that we thought was a little sloppy. We would pick a school that was primarily using it in math and reading, we would pick one that … So, you know, we would try to have some sort of balance regarding what we thought our overall population

Page 163 Cite

Suggested Citation:"Quality Right from the Start: The Methodology of Building Testing into the Product." National Research Council. 2003. Survey Automation: Report and Workshop Proceedings. Washington, DC: The National Academies Press. doi: 10.17226/10695.

×

looked like, for providing the product in new release. I think that’s different from what you’re saying, but it reminded me of that. And, as I understand, how you would be dealing with that would be kind of cut the development right here, it stops, and send to external organizations using different approaches.

MCCABE: [very faint on recording] One of the problems with this, it turns out, with universities is that it’s way too expensive to use as these operational sites…. But an operational company will usually be able to field a couple of operational sites, and typically those focus on different priorities. Or they pick one site that is known to get something done quickly; that information is relevant. And then you get some comparisons among the methodologies.

GROVES: I see that as an interesting comment because it would be great to have something that allows us to judge which of those sites. The question is the criterion … [trails off to inaudible] I wonder what kind of evidence you can produce to say: I prefer this method to that one? What do you fit into the criterion?

MCCABE: I think it’s really important to realize that there’s no single testing method used at any of these places. [inaudible until closing comments] I think that the place that test well do so because they have at least four or different methods that they use at various stages. I don’t know of any place that has just one thing going in testing.

SMITH: I think that a lot of those methods are very good, and I’ve also found that if you integrate them in a regular way, then they smooth out and the problems with them start to diminish. For example, code review can be a threat to people who have never done it before, because they remember back in college when their work was being graded or something. And after it becomes commonplace and everybody’s had that chance to go around the table, it becomes a very different thing. And you find a lot of problems that way, and you understand the code, and you do some cross-training to some extent. And I think that everything else you mentioned—unit testing, regression testing—if regularized does kind of smooth out. And costs become less. Would you concur?

MCCABE: One thing we didn’t talk about that we ought to comment on is relative cost of testing software versus surveys. In software, it’s quite high; in software, it’s often, in the total budget, maybe 60 percent. Now places like Microsoft, where you’re shipping millions of products, the testing cost tends to be very, very high because the cost of errors is so high. Versus more of an engineering shop. But, still, we’re in the software business and producing, and so the costs of testing can go up to 60 percent. Now how does that compare with surveys?

DOYLE: It’s a small portion of the current budget. Of the total budget.