10
Documentation and Archiving
Microsimulation models, like other models that are implemented with electronic computing technology, require documentation: of their input data, their software specifications and program code, their mathematical formulas and equations, their interface with the computer's operating system, the way in which analysts and programmers can interact with them, and other aspects of their design and use. Without adequate documentation, users cannot run a model or modify its software. Equally important, users cannot understand a model's design or operation and hence cannot properly specify changes to the inputs or interpret the outputs.1
The principles and practices that we propose for cost-effective design and implementation of microsimulation models in Chapter 6 stress the need to incorporate model features that facilitate good documentation and to provide sufficient resources and priority attention to complete it on a timely basis. In this chapter we discuss at greater length some of the qualities that we believe characterize "good" documentation for microsimulation models. We also consider the related issue of archiving of models, databases, and model runs. Archiving is an important part of model documentation, broadly conceived, because models and associated databases frequently change. Without good archiving, it is very difficult to undertake activities such as validation studies that examine past model performance.
STANDARDS FOR MODEL DOCUMENTATION
As all developers of complex computer models know, the preparation of documentation is a seemingly thankless task that can entail considerable time, expense, and drudgery. Consequently, documentation is often deferred until so late in the development process that model developers and users either forget to include important points or leave the job half-finished. Often, critical aspects of a model are known only to its developers, and if they are not available, the model may become impossible to update or use. In these cases, the investment in the model is essentially lost: the RIM and KGB models are object lessons in this regard. In other cases, the documentation may be adequate for the programmers and analysts who work with the model regularly, but inadequate for others who may use the model only occasionally or for analysts who work only with the model outputs.
There are strategies that can help facilitate good documentation, such as requiring programmers to include adequate comments in their code, building features into the models such as automatic links to a central variable directory whenever new or modified variables are specified, and taking full advantage of features of word processing systems that permit ready updating of all parts of the documentation affected by changes in particular model components.2 However, an essential element of obtaining adequate documentation has to remain the commitment of the model sponsors to the importance of the task.
Although we run the risk of stating the obvious, a way to underscore what is involved in a commitment to good documentation and to begin to define what we mean by "good" is to list all of the functions related to a microsimulation model that are dependent on documentation:
-
Use. Documentation must be adequate to permit people to set up and run a model correctly and efficiently and to detect and correct errors in a timely fashion.
-
Development. Documentation must be adequate to permit people to change a model—in both minor and major ways—in a cost-effective manner that minimizes the introduction of errors into other parts of the model.
-
Comprehension. Documentation must be adequate to permit people to understand the assumptions and operation of a model, supply usable specifications for model inputs, and appropriately interpret the model outputs.
-
Validation. Documentation must be adequate to allow people, including those who are and are not directly associated with a model, to conduct studies of the quality of the model outputs and otherwise contribute to model improvement.
-
Access. Documentation must be adequate to support portability of a model to different sites and to allow would-be users to develop facility in working with a model in a reasonable amount of time.
These various functions give rise to the need for several distinctly different kinds of documentation. For example, people who set up model runs need technical information, such as the software instructions that are necessary for the model to perform in the operating environment of the particular computer being used. People who interpret model outputs need understandable descriptions of the model's design, how it works, and the assumptions underlying various model components. They also need information about the particular specifications that were used in model runs, such as the parameters of the proposed policy change(s) that were simulated. People who are not currently familiar with the model but who want to begin using it need tutorial material.
With reference to documentation of a model per se (in contrast to documentation of particular model applications), one can identify three major types: informational, that is, documentation that provides general information about the design and operation of the model; instructional, that is, documentation that provides instructions on how to operate (or interpret the operation of) the model; and reference, that is, documentation to which users can refer easily and at random to answer specific questions about any and all aspects of the model. Each of these types of documentation, in turn, can be targeted to different audiences, such as analysts with some expertise who specify model runs, programmers who implement model runs, and nontechnical users who interpret model outputs.
Although particular documents may serve a particular purpose for a specific audience, several general criteria apply to all model documentation. These criteria have to do with content and format: documentation should be accurate, clear, and complete; and documentation should be designed for ease of use, with all components formatted consistently.
With respect to the various elements that ought to make up a documentation package, the Institute of Electrical and Electronics Engineers (IEEE) (1988) has published an industry-wide standard for either instructional or reference documentation of software. According to the standard, both types of documents should include the following kinds of front matter: title page, restrictions (e.g., copying restrictions), warranties and contractual obligations, table of contents, and list of illustrations. Both instructional and reference documents should include an introduction that provides: a description of the audience, a statement of applicability (e.g., model version, required hardware, and required operating system), a statement of purpose, a document usage description, a list of related documents, conventions (such as use of symbols), and instructions for how users can report problems.
The IEEE standard for the body of the document differs for instructional
and reference materials. For instructional documents, the body should include: a statement of scope, descriptions of materials the user will need and the preparations the user must make to complete the tutorial, cautions and warnings to help learners avoid major problems, and the actual tutorial informing the user how to invoke functions and what results or possible errors to expect. Finally, instructional documents should include related information such as constraints or limitations.
For reference documents, the body pertains to the components of the software or model itself: each component should begin with a statement of purpose, describe needed materials and preparations, describe all the needed input data for the particular module or function, provide cautions and warnings, describe how to invoke the module or function and how to interrupt and recognize normal or abnormal terminations, and describe the expected outputs.
The IEEE standard also calls for a separate document on error messages, known problems, and error recovery, along with cross-references to these topics in other documents that make up the documentation package. The standard also suggests that appendixes be included to provide detailed descriptions of input and output data, file structures, and global processing constraints, along with sample inputs and outputs. Finally, the standard requires a bibliography, glossary, and index. We believe the IEEE standard is both applicable and appropriate for documentation of microsimulation models.
A DOCUMENTATION CASE STUDY
Using the criteria and standards outlined above, the panel undertook a review of the available documentation for three representative static microsimulation models of income support programs—TRIM2, MATH, and HITSM.3 Our findings from this review were discouraging with regard to the current state of model documentation:
-
The TRIM2 and MATH documentation serve primarily a reference function for both experienced analysts and programmers, although TRIM2 provides some useful instructional material as well. The HITSM documentation serves primarily an informational purpose for a nonexpert audience and is wholly inadequate as a guide for model users.
-
The technical writing in these documents is of highly uneven quality. Formatting is inconsistent, and the text is filled with jargon and mnemonics. There are inaccuracies and typographical errors.
-
Many of the documents refer to different versions or releases of model components, but the archiving system is not explained, and obsolete cross-references suggest that updates are not carefully integrated into the documents.
-
All of the documents lack some components of the IEEE standard. A key component missing from virtually all of them is an index. The body of the reference documentation for particular components of the models is generally complete with regard to descriptions of purpose, inputs, and outputs. The TRIM2 and MATH documents also provide satisfactory information about how to invoke each component. Weaknesses are in the lack of cautions and warnings and a general lack of discussion of error conditions.
-
The copies of the documentation that were provided for review have many publication flaws, such as missing and upside-down pages, suggesting that the documentation function has low priority.
-
Finally, the documentation of these models is formidable in content, format, and sheer size. The lack of user-friendly documentation has probably contributed to the lack of use of these models by a broad community.
Some of the features of the documentation for these three models, such as the lack of instructional materials and insufficient attention to such matters as consistent formatting, have to do with the way in which microsimulation models have been developed and applied.
Historically, the models have been marketed as a service rather than a product. Typically, policy analysis agencies contract for the services of experienced model analysts who, in turn, provide specifications for programmers to implement. In the process, the models are constantly being updated and changed. Until now, the effective audience for model documentation has been the purveyor of the service who, presumably, can cope with such problems as lack of cross-references or tutorial materials. The purchaser of the service—namely, the policy analysis agency—has generally been expected to have primary interest only in the outputs, not in the technical details.
This picture of the microsimulation model industry (also see Chapter 11) is somewhat exaggerated. Some agencies use models such as TRIM2 in-house, and other agencies have analysts who are very knowledgeable in the workings of the models. Moreover, the quality of the documentation for the models included in our review is much better than that for many models that agency analysts have developed for their own use (indeed, the latter often do not have any documentation). Nonetheless, we conclude that the documentation for the major microsimulation models that are used heavily for policy analysis exhibits many weaknesses. These weaknesses militate against user understanding of model outputs and the ability of people not closely involved with the models to evaluate them or to become members of the user community.4
RECOMMENDATIONS
Clearly, we see a need for increased investment in the quality and scope of microsimulation model documentation. We believe it is imperative that policy analysis agencies set high standards for documentation and provide the resources to make it possible to achieve those standards. The IEEE standards are an obvious set to adopt. However, given the complexity of most microsimulation models, it may be advisable to investigate whether and what kinds of added standards are required to achieve the goal of high-quality documentation that meets users' needs.
We believe that policy analysis agencies will readily acknowledge the value of investing in the quality of the models' reference documentation (with regard to formatting, accuracy of cross-references, etc.) and perhaps in additional informational material about the models. These kinds of investments not only help the people who run the models (whether agency or contractor technical staff), but also help analysts who work with the outputs. However, the agencies may question the value of investing in such materials as tutorials, which would primarily help to enlarge the user community rather than give direct support to the agencies' policy analysis needs.
We argue, however (as we have in other parts of the report), that the continued ability of microsimulation models to provide high-quality, relevant, and cost-effective service to the policy analysis function depends critically on making the models more accessible to a broader community of users, including the agencies' own staffs. (We develop the latter point further in Chapter 11.) Expanded access is needed to enlarge the pool of ideas and perspectives for model development, to facilitate experimentation with alternative model structures and uses, to support model validation, and to encourage the highest level of understanding and the appropriate use of model outputs in the policy debate. Expanded and improved documentation is one of the most important means toward a broader user community.
Recommendation 10-1. We recommend that policy analysis agencies set high standards for documentation of microsimulation models and their inputs and outputs. Agencies should investigate existing standards, such as those published by the Institute of Electrical and Electronics Engineering, for relevance to microsimulation models and determine what additional standards are needed. The kinds of documentation that agencies should require to be developed for analysts and programmers who use, or expect to use, the models include general informational materials; tutorials; and detailed reference documents for model components that describe their theoretical basis, assumptions, operation, inputs, and outputs.
To this point, we have discussed primarily documentation for models and associated databases in their current form, abstracted from any particular applications. However, models and databases are frequently updated and modified, and the policy issues they are asked to address also change over time. As discussed in Chapter 9, this situation complicates the task of model validation, particularly of the type that involves comparing projections with actual outcomes. Often, such external validation studies will be carried out with a current model, simulating current law in comparison with the law in effect at some earlier period. These studies will require access to earlier versions of the database. In other cases, in which a legislative change corresponded to an actual simulation made at the time, a complete validation will require access not only to the original database, but also to the original model and specifications.
For these kinds of uses, it is essential to establish a workable system for documenting major model applications by archiving the model outputs and the versions of the models, databases, other inputs (such as control totals), documentation, and specifications that were invoked at the time. In addition, databases (such as successive March CPS files) should be regularly archived. Finally, the archiving system must facilitate ready retrieval of materials, including appropriate comparison values, when they are needed for particular validation purposes.
Recommendation 10-2. In order to facilitate model validation, we recommend that policy analysis agencies require archiving of microsimulation model databases on a regular basis. In addition, we recommend that the agencies require full documentation and archiving of major applications of microsimulation models. The archived materials should include the model itself, the documentation of the model, the database and other inputs, the analyst's specifications, and the outputs.
We recommend in Chapter 9 that policy analysis agencies let separate contracts for validation of microsimulation models and their outputs. The agencies may also want to consider letting separate contracts for model documentation and archiving. The work involved in preparing and updating the documentation for large, complex models, particularly to meet the needs of different audiences, can be arduous and time-consuming. Similarly, developing and maintaining an archiving system, even one limited to databases and major applications, is a substantial undertaking. The people involved in working actively with the models must necessarily give priority to the exigencies of the policy debate. They are rarely in a position to forecast when they will have slack time that can be used for background activities such as documentation, and the all-too-likely outcome is that the documentation effort will be stinted. Of course, the model developers and users must be involved in documentation and archiving—they are the ones with first-hand knowledge—but having a separate group assume the