SMITH: Mr. Robinson—Harry Robinson, that is—a question about Microsoft … has the advent of the Internet with the ability to update modules sort of quickly with patch packages and so on, has that in any way changed their feelings about testing? Do they have any sort of a sense of, “well, if it’s wrong, we’ll just update it later?” Or …
ROBINSON: There might some of that, but now that the Internet has opened the door to so many security bugs, it’s actually swung the other way.
SMITH: I always hated having to buy a new computer and load four packages on it—I mean, a brand new computer, you know. But … any other.
CORK: That’s a good breaking point, so let’s thank Bob.
CORK: The next speaker is Larry Markosian. He is currently a technology transfer consultant at the NASA Ames Research Center. He founded Reasoning, Inc., and was for several years the product manager for their principal product, which was a tool for automated software defect detection.
MARKOSIAN: OK, so, my talk is going to expand on some of the themes that have been discussed thus far. And I’m going to perhaps suggest some solutions that are a little more specific to addressing the problem that we’ve seen. So, the roadmap of the talk is to review some of the basic challenges. Then we’ll look at some testing principles and a couple of specifics—a methodology for programming that Bob [Smith] alluded to in his talk earlier—and also a set of integrated tools that actually will work well with this paradigm. And then we’ll just do a quick summary.
So the challenges include many of the same things we see in other software development projects. In fact, there is relatively little that I’ve heard thus far that uniquely characterizes this problem. One issue that has come up time and time again is the large state space, so we’ll look at ways and tools that can help address that. But, in fact, the reality is that the large applications put out today also deal with a large state space, for the reasons that Tom McCabe pointed out earlier.
I’ve also heard that there’s little time for testing, and little formal testing being done currently—you said that it was a very small portion of the budget. But we’ve also heard that there are bugs in the products that are being delivered to the customer, and these include things such as
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 164
SMITH: Mr. Robinson—Harry Robinson, that is—a question about Microsoft … has the advent of the Internet with the ability to update modules sort of quickly with patch packages and so on, has that in any way changed their feelings about testing? Do they have any sort of a sense of, “well, if it’s wrong, we’ll just update it later?” Or … ROBINSON: There might some of that, but now that the Internet has opened the door to so many security bugs, it’s actually swung the other way. SMITH: I always hated having to buy a new computer and load four packages on it—I mean, a brand new computer, you know. But … any other. CORK: That’s a good breaking point, so let’s thank Bob. INTERACTIVE SURVEY DEVELOPMENT: AN INTEGRATED VIEW Lawrence Markosian CORK: The next speaker is Larry Markosian. He is currently a technology transfer consultant at the NASA Ames Research Center. He founded Reasoning, Inc., and was for several years the product manager for their principal product, which was a tool for automated software defect detection. MARKOSIAN: OK, so, my talk is going to expand on some of the themes that have been discussed thus far. And I’m going to perhaps suggest some solutions that are a little more specific to addressing the problem that we’ve seen. So, the roadmap of the talk is to review some of the basic challenges. Then we’ll look at some testing principles and a couple of specifics—a methodology for programming that Bob [Smith] alluded to in his talk earlier—and also a set of integrated tools that actually will work well with this paradigm. And then we’ll just do a quick summary. So the challenges include many of the same things we see in other software development projects. In fact, there is relatively little that I’ve heard thus far that uniquely characterizes this problem. One issue that has come up time and time again is the large state space, so we’ll look at ways and tools that can help address that. But, in fact, the reality is that the large applications put out today also deal with a large state space, for the reasons that Tom McCabe pointed out earlier. I’ve also heard that there’s little time for testing, and little formal testing being done currently—you said that it was a very small portion of the budget. But we’ve also heard that there are bugs in the products that are being delivered to the customer, and these include things such as
OCR for page 164
unexpected behavior—what questions are being asked, and occasionally crashes in the system that probably upset the FR quite a bit. Another problem that I’ve heard is the relatively small amount of use of development artifacts, software artifacts. And that doesn’t mean only pieces of code but also the tests—whatever the test suites are, along with the documentation. And then, finally, we’ve heard that there’s very little time to produce documentation. So all of these problems are common in many software projects, and there are a couple of principles that we might pursue in finding a solution to these problems. They are also becoming more widespread generally in software development. One of them is to integrate the process—the development process itself. That is, how people actually go about developing software. And the concepts here are that we want to have close involvement at all stages of the lifecycle by all the stakeholders in the process (or at least in the product). And perhaps that bias comes from my own previous position, working as a product manager, where I had to be sure that all the stakeholders were in agreement from the very beginning. But actually we can do a lot more here than what I was able to do. Another concept that’s important here is iterative development, and a number of people have mentioned this before. We don’t want to have—particularly if there’s a hard deadline—everybody spending a lot of time initially doing requirements analysis and then, you know, design and implementation. And then, at the end, we find that we send it over the fence and the customer rejects it because of a requirements issue that could have been found much earlier in the process. So, we should have iterative development, with small iterations where each iteration produces a useable—or at least a testable—product. And that concept hascomeupherebefore. Well, what we haven’t discussed is a particular methodology that in-corporates these concepts. And there are others as well; I don’t mean to be advocating this particular one. This is a an example, and you should look at its strengths and its weaknesses before you get into it. The other principle is integrated tools. We should try—the two ideas here are first, that, during the lifecycle we produce many software artifacts. Requirements, specifications, test cases, code, design, and so on. We should formally try to capture them as much as possible. And then what we can do is use automated reasoning techniques to derive the next level from the previous level—[to] help go from the specification to the implementation, for example. So, let’s take a little bit of time to look at the interesting aspects of “extreme programming” (XP). This is a style of development that’s particularly suited to program domains with changing requirements. And
OCR for page 164
what I’ve heard is that this—that CAI is one of those domains, that we could be going along on our goal of shipping the questionnaire and then, at the last minute or at various points along the way, a new requirement will come up. “Congress just passed a law,” I heard. So this has to be reflected in the code. But, more generally, even without that kind of last minute requirement, there’s the problem of having the customer understand the requirements that they are specifying. And usually the customer has not thought them through well enough to realize that there are significant implications that are unaddressed in the requirements. So by doing iterations—rapid iterations—on this and continuing to do requirements analysis through the project, we can reduce that risk. OK, well, it says here that this technique is particularly appropriate for projects with high risk. We’ve already identified some of the sources of risk here. And one of the problems here—one of the great sources of risk—is that if you have to deliver by a certain date, ship at a certain date, then the software project is automatically at a high state of risk. So extreme programming is also useful with relatively small development teams. There are cases when it’s been applied successfully with larger development teams of 30 or 40 people, as well. And if you do have larger teams than that, then you might think about breaking the project up into several smaller subprojects in a rational way. One of the requirements for doing extreme programming is a commitment by all the kinds of people involved to work together shoulder-to-shoulder. So, we’re not going to have developers sitting there doing development on their own. We’re going to have managers involved and going to have customers involved—very closely involved—with the developers at every stage of the development process. Another requirement is testability because—as you know, or at least will see in a little bit—the mantra is, “test early and test often.” If we can’t, if we have a project that’s not readily testable or an application that’s not readily testable, then we’re not going to be able to do this. So, here’s one of the stages in extreme programming. The first stage is usually the planning stage. And, in this, user stories are written. A user story is like a scenario, but in the context of extreme programming we try to keep that very short. Just several sentences describing a use case. So it’s less formal and is shorter than a standard use case. And, again, the concept is to get everyone on the same page very early in the project. Then, there’s a release planning session that creates the schedule, and again the idea is frequent, small, testable releases. The project—in order to achieve this—is divided up into iterations. Iteration planning starts each iteration, and there are techniques for measuring what’s called “project velocity.” Again, the concept is that you’re going to have releases very frequently, like every three weeks; on a small project,
OCR for page 164
maybe even more often. So that we can very quickly monitor the status of the project and determine whether we are meeting our goals. One other important part is that we fix the process, we fix [it] when it breaks. And there are various ways in which it can break, and we’ll talk about that a little further. The stand-up meeting … the purpose of the stand-up meeting—and that’s a little bit interesting—is that each day starts with a stand-up meeting where all the participants get together. And the purpose of the stand-up meeting is to avoid these long meetings, these three-hour meetings where people sit around—they’re very low bandwidth meetings, people fall asleep, and very little gets communicated. What we do at the beginning is have a very short meeting, with a time limit set, where all the issues are brought up, and then the development group breaks up into the natural teams to address these issues. And how that happens I’ll discuss in a bit. The next step is generally design, and the guideline here is to choose a system metaphor. A system metaphor gives you a way of establishing a consistent communication style and a consistent communication language, all the way down the requirements, down into the specifications and the code. So that people have a similar set of concepts—the class names are consistent, the levels are named consistently, and so on. Another concept is that CRC cards are used for design sessions. If you’re doing object-oriented programming, these are class-relationship-collaborator cards. These describe, for each class, what the methods are and what the other objects are that are involved in that. So the idea is to quickly get to an outline that’s understood and mutually agreed upon. Another principle is that no functionality is added early, and the reason for this is that we don’t want to get ahead of ourselves. This is a project plan and, as we’ll see in a little bit, a key concept is that most if not all of the implementers on the project are capable of implementing any piece of the project. So if subteam A gets done quickly, they don’t go off and implement something for the next mini-release. What they do is go off and implement a piece that may be running behind. Then another mantra in XP is to refactor whenever and wherever possible, in order to eliminate redundancy and so on. Now we get to the point where we’re actually coding. The customer needs to be always available. The customer is dictating the requirements of the project, and the customer has to be involved—particularly involved—whenever something, a piece of it or an iteration of it, is done, and there’s a need to evaluate it. Because that’s what’s going to help them—actual pieces of operating code, running code, even if it’s very quick and a small part of the overall system that’s being developed. That’s what’s going to get the customer thinking about the requirements
OCR for page 164
further. And you’ll also find out whether you’ve done the right thing. Maybe there were just some omissions. Code needs to be written to agreed-upon standards. And, again, the idea here is to support the notion that we’ll get to a little bit further, which is that the whole code needs to be understandable to the entire team. We can say—it may very well be that there are some parts that are only really understandable to the expert who wrote them. But, to the degree that happens, we increase our project risk, if something happens to that person. So we want to have code written to agreed-on standards, because that helps communicate things to the group and it’s easier to move people around. We also want to unit test first, and there’s been some talk about that thus far, too. Production code is pair-programmed, and what that means is that instead of having two workstations with one programmer at each we have one workstation with two programmers working at that workstation, on the same code. And they shift, so that they are each driving at different times. The evidence is—at least in the XP world—that this really works, that you get much higher programmer productivity by having two people working on the same code at the same time. In my … in the projects we did at Reasoning, we didn’t actually do that, so I can’t speak from experience on that. Most often, and in my case in particular, we used a number of the reductive, a number of the principles, but not the whole dogma of XP. Integrate often—that goes pretty much without saying, because we’re going to have many releases. And at each point we’re going—as part of each release, we will want to integrate, and usually integrate prior to the release. Now there’s a concept, also, of collective code ownership, and I’ve alluded to some of the components of that; we’ll take a look at that in a moment. There’s also the concept of no overtime, and that certainly didn’t work for us. But the notion is that if you have a good project plan, well, the fact that you’re requiring people to put in overtime on that means that there’s something wrong with the plan, that was not planned in. And so we should really be trying to get the project right—maybe not completely at the beginning, but certainly incrementally as you go along, refine it to the project plan. Here is a slide I promised on collective code ownership, and notice at the top here it says, “Move People Around.” And in the center it says, “Pair Programming.” So these are two of the key concepts—we want people, the developers, to adopt any role in the project, work on any component of the project. And pair programming I’ve already indicated what’s involved there. Now certainly there are going to be cases where
OCR for page 164
you have applications running over a network and there’s going to be some network expertise that’s required, and we’re not going to be able to move people around that well. But, to the degree that we can, we reduce risk because everybody understands—or is fairly quickly able to understand—most aspects of the project. And finally we get to testing—not “finally” in terms of the project, but “finally” in the sequence of slides on XP. All programs need to have unit tests; the code needs to pass the unit test before it’s released. And I guess that the most important point here, really, is that acceptance tests are run often, almost continuously run, and the scores published. And the reason for that is so that everybody can—it’s part of this shared responsibility for the project, and everybody needs to know where the project stands. Extreme programming seems to be coming into a lot of popularity these days, and there’s a Website for you go to go to learn more about it. And I want to emphasize, again, two things. First of all, it’s not clear that you need … you don’t need to do this. The methodology suggested here is not the only methodology that could be useful in developing questionnaires, but it’s one that I think—from what I’ve heard—is going to help reduce risk an enormous amount. And even just picking out some of the basic principles from this paradigm will help. OK, so that’s my perspective on the integrated development process; now, let’s take a look at some integrated toolsets that can help with some aspects of the development, with many aspects of the development. OK, first of all, what we want to do using integrated tools is first capture, formal capture, of logical artifacts. Now, I debated for a while on this slide whether to say “machine capture” or “formal capture;” I felt that “machine capture” was too weak and “formal capture” was too strong. But, that’s where it ended up. By “machine capture,” we might simply mean to make sure that all documents are available in a collaborative development system to everybody on the project. But that’s a little bit weak because it won’t really allow the use of tools to use on these development artifacts. So we want them captured in a way that supports the use of tools. And the current standard—for capturing requirements in design, at least—is UML. There are extensions of UML as well, but let’s say that that’s one standard. Historically, there have been many others that led up to this, and they are still in use—OOE and OOA are others. Once the designs and other artifacts have been captured, then you can apply tools to manipulate them. And there are a variety of tools that are available, so let’s look at what some of them do. The common tool capabilities are to capture the requirements and then to trace those requirements through the development lifecycle. And we’ll mention some of the tools a bit further down. Other tools are avail-
OCR for page 164
able to model the usage scenarios and use cases—that is, to model usage cases to help you understand usage scenarios. There are also tools for migrating from use cases to UML sequence and collaboration diagrams. So, again, these are tools so that if you capture your designs and models in a formal method then you can move further along the development lifecycle. We also have tools that help you build class diagrams and actually generate code from the class diagrams. Now, the code is generally of the form of a skeleton but it help move things along. But there are also cases—particularly in the case of finite state machine operations—where you can actually generate a lot of the code. CAI seems to be particularly amenable to finite state modelling, so there are—within UML—tools that allow you to model state machines using state diagrams. And then there are ways of modelling component relationships (the components of your software systems are the source code packages and their relations), and then your delivery system and the components of the delivery system as well. There are various tools and toolsets that are available, integrated— more or less integrated. Rational Rose is probably the most widely used integrated toolset for this, and then there are other products from these companies. And I mention this one because it’s open source; I don’t know how appealing that is to you, whether you’re into the open source community. And then there are other standard toolsets as well. Let’s take a look at some examples of use cases. Now, this is the beginning of a use case to look at use cases. Here, we’ve got defined a use case, and then we’ve defined the other classes that are related to it and what the relationships are between this use case and the other classes. So we’ve got use cases, and use cases are related to users who express them; they are related to analysts who understand them, or analyze them, or try to understand them or complete them. And then there are designers down here, operating on them as well. And, of course, programmers and testers out there; I don’t know why I chose that, it must have been late at night. So I chose that guy … So here’s an example of a UML use diagram. Now, I’m going to be showing you only three diagrams from UML, but there are twelve in UML 1.3 that cover a lot of development/design activities. So here’s a case where we have a sequence diagram; these are the classes. We have a caller, a phone, and a recipient. Now this is not intended to model anything you’re doing in CAI … So, in the sequence diagram, we show the messages that occur, that are sent from one of the objects in the diagram to another. So, a caller picks up the phone, the phone replies with a dial tone, the caller then dials, and the recipient replies with a ring notification, and then finally the phone picks up. Actually, this should probably extend through … Yeah, the recipient picks up the phone and says, “Hello?” Now, UML
OCR for page 164
supports defining these with a temporal relationship on them. So this is an example of a sequence diagram. Now, finally, what might be even more relevant in this case [are] state charts. There’s been a lot of discussion of state machines, and the state charts are a way of representing state machines. They have several advantages over simply drawing the complete finite state machine, which may—in fact—have too many states to even be drawn. First, they do allow you to specify states. They also allow you to abstract the states, and the key—I think—to getting control of the state explosion problem, is abstracting the states. So here we have a state, state B, which has two substates, B1 and B2, and here’s a state C that has two substates, C1 and C2. And there’s another state, A, that has no substates. Another thing that they allow you to do is to aggregate the states, so that—the concept here is that once you move into this state you’re actually in both of these states at the same time. So there’s a bit more expressiveness here than you find in a traditional state machine model. Now, I have a couple of notes, and I want to say a few words about how the models like this differ from the questionnaire. I think you asked a question about how, whether there will be a duplication of effort and how much there will be, and there was some discussion about this. I guess that my perspective on that would be that, to a large extent or significant degree, if you get a state machine model correct, then you’re very close to getting the code correct. There are actually tools out there that will allow you to automatically generate the code, or a lot of the code, from this state machine. Also, you’ll be able to get test cases generated, with some degree of automation, from the state machine and the state charts. I guess the other point is that the states—when you’re doing modelling of a questionnaire, you can model them at various levels of abstraction. This abstraction can be done at various levels, so you can begin testing—as Harry mentioned—early. And then as other features become more interesting—for instance, I think you brought up the case, or someone brought up the case, of five questions or five variants of the same question reflecting different outside factors—well, it’s really one question but it’s being asked in five different ways. So we don’t need to model every one of those; we don’t need a state for every one of those possible questions. We can abstract all of those into one state, and then we might—so at a certain level we’ll be doing testing to make sure that we get to that question, that abstract question. And later on as we refine the model further we would want to know whether we asked the right variant of the question, and so will have some more states to deal with. But they’ll be abstracted so we can do the testing at the one level, and when things are further along we can do further refined testing.
OCR for page 164
Now, in addition to the UML tools, there are various non-UML tools that are, to a greater or lesser extent, integrated. These are all from Rational. I mention Rational as one example because it’s something of an industry standard, but I have no commitment to that and I don’t advocate that particular tool on this project. There are others—for instance, ILogics provides state chart tools and other UML tools. And they usually provide a very similar range of coverage of the UML. So, non-UML tools that will help testing are, for example: Purify, which will detect memory leaks and other structural bugs; Quality Architect, which again is a tool from UML that automates test case generation and management (I spoke about test case generation earlier); and then various other tools. There are alternatives to all of these particular products from Rational. In summary, first of all, we should consider a different model of the development process—one that imports some, if not all, of the principles of XP in order to reduce risk. Another problem that we should address is the loss of information, or doing a lot of work to get from one step from another. And that can be automated; when the software artifacts are formally captured, then we can apply tools and reduce the level of effort that’s required there. We also gain a greater degree of traceability from the implementation back to the requirements. XP gives you continuous monitoring of the project state. And I don’t mean status reports by this; during the stand-up meetings, you’re basically continuous monitoring so that everyone knows what all the pieces are. This is quite different from drawing a Microsoft Project document and then trying to plot where you are along the way to meeting milestones. Collective ownership enhances understanding among participants, and automated generation of software artifacts, again, imports this notion of risk reduction by allowing the project history to be captured and preventing loss of information. What was not really in my talk but responding to some of the issues that came up, everybody seems to have commented on the cost of fielded bugs. So, [laughter] I have to do that, too. This goes way up, if I could have just one more minute? OK, the current organization that I work for is NASA, and we’ve had some spectacular bugs recently. [laughter] So the thing to keep in mind is that, yeah, there’s a cost involved in fixing bugs that I think has been widely estimated as too low, here. For example, if you go and look at Capers Jones’ documentation here, the ratio between the cost of finding a bug during the development, when the programmer is able to come in and look at the results of the unit tests on the things done the day before, and the cost of fixing a fielded error is on the order of at least 1,000.46 So there’s an incredible difference, and we’ve seen that in our customers as well. Well, in Mars missions, there’s another 46 Capers Jones is chief scientist emeritus at Software Productivity Research, Inc.
OCR for page 164
cost, and that’s collateral damage to the business. And I don’t know what that cost is here; usually it’s an intangible cost. But you probably have some sense of the level of frustration and so on, not knowing that no vehicles are going to crash or that nothing is going to sink the project. ROBINSON: So I guess it would be safe to say that your costs are astronomical. [laughter] MARKOSIAN: The execution costs, yes … Any questions? DOYLE: One of the things that we had done was to have people specialize in a particular survey, and often you end up with one programmer on a complicated survey. But a lot of these testing things might suggest that we need teams; we need more than one on a project. Are you recommending that we change our staffing to that, instead of having person 1 on one project and person 2 on another, we have two people on the same projects, or something else? MARKOSIAN: Well, I think … I can’t really answer that because I don’t know enough about the application. In the applications I’ve been describing here there have been multiple people working on one project … DOYLE: And the minimum number you’ve listed is two programmers … MARKOSIAN: Right, right. Well, one would expect, then, this has come up before, that there’s a lot of experience that should be shared among the developers, even on different projects. And that’s what I don’t understand well enough, to advocate a position there. SMITH: I risk making things too light here, but there’s a saying that the optimal number of programmers on a project is two. [laughter] For example, Unix was originally built by two people at Bell Labs, and all the Bell executives said that they didn’t know it was happening … and if they had known, they would have stopped it. MARKOSIAN: Harry? ROBINSON: For the productivity of the pair programming, where do you measure their productivity? Is it the amount of code that actually makes it to release-level quality? MARKOSIAN: No, their effectiveness is in producing code of acceptable quality … well, yes, it’s … certainly, the metrics that are used such as lines of code—and we could look at those, function points is another metric—all have their problems. In the area of XP, I think that the focus is on quality and in meeting deadlines, and in reducing risk, so the best metrics for that, I don’t know. Any other questions? Okay, thank you.