Cover Image

PAPERBACK
$93.75



View/Hide Left Panel

Model Documentation



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers Model Documentation

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers This page intentionally left blank.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers 10 Documentation for Microsimulation Models: A Review of TRIM2, MATH, and HITSM Kevin M.Hollenbeck INTRODUCTION As part of its investigation of microsimulation models, the Panel to Evaluate Microsimulation Models for Social Welfare Programs examined the extant documentation of three such models—TRIM2 (Transfer Income Model 2), MATH (Micro Analysis of Transfers to Households), and HITSM (Household Income and Tax Simulation Model). The precise documents examined are as follows: for TRIM2, Webb et al. (1982, 1986) and Bergsman (1989); for MATH, Doyle et al. (1989), Social & Scientific Systems, Inc. (n.d.), and Doyle (1989); and for HITSM, Lewin/ICF, Inc. (1988). In some ways one might presume that the panel was on firmer ground in this examination than in other aspects of its investigation. After all, few members of the panel have actually performed an application with a microsimulation model. But with the widespread use of personal computers for word processing, spreadsheet applications, and information retrieval through database packages, millions of individuals in professional, technical, and clerical occupations, including all of the panel’s members, have navigated software documentation. Another common encounter with software documentation, particularly among Kevin M.Hollenbeck is senior economist at the W.E.Upjohn Institute for Employment Research; he served as a member of the panel.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers researchers and policy analysts, has come through use of statistical software packages. Nevertheless, because of the way that microsimulation has developed and been applied, comparison of microsimulation model documentation to personal computer software or statistical package documentation is not necessarily appropriate. This is because microsimulation models have, for the most part, been marketed by their developers as a service rather than a good. The prototypical transaction for microsimulation involves a sponsor purchasing the services of an analyst who specifies model parameters and changes for an applications programmer, who runs the model. The model is being constantly updated and changed. An analogy might be the purchase of insurance from an agent who uses a computer model to examine outcomes of different premiums, investment returns, and life tables. The audience for the documentation of that model, just like the audience for the documentation of microsimulation models (up to this point), is limited to the purveyors of the service. The purchaser of the service is expected to have little interest in the technical details, only the outcomes. From this perspective, then, the panel’s seeming ability to evaluate documentation was severely constrained. The role of the panel is analogous to providing advice to the purchaser of insurance (or sponsor of microsimulation) concerning documentation of the computer model used by the insurance company’s agent. In general, the panel’s belief is that the better the documentation, the better the model and thus the more comfortable the purchaser should be with the model’s outcomes.1 This chapter first establishes general criteria that can be used to evaluate documentation and then addresses the documents mentioned above for TRIM2, MATH, and HITSM. The final section evaluates all three sets of documentation based on an industry standard for software documentation. General conclusions from this review of the documentation are as follows: In general, documentation can serve informational, instructional, or reference functions and must serve various audiences. Microsimulation model documentation in particular must cater to a wide range of backgrounds—from individuals with an interest in microsimulation but no technical expertise to policy analysts with technical expertise and programmers who might conduct an application or change a model in some manner. The TRIM2 documentation comes closest to serving these functions for all audiences; the HITSM documentation is intended primarily to provide information to a nontechnical audience. 1   Of course, there are many reasons why this belief may be inaccurate. For example, one might argue that the best models are produced by the best technicians, who may not be skilled at producing good documentation. But recognition of the possibility of an inverse relationship between good programming and good documentation does not imply that low-quality documentation is evidence of high-quality programming!

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers The quality of the writing in the documents reviewed is highly uneven. Formatting is inconsistent. The texts are filled with jargon and mnemonics, and there are many inaccuracies. The many typographical errors confuse the reader and raise doubts about the credibility of the models. Many of the documents have dated updates and refer to model (or documentation) releases, but the archiving systems are typically unexplained and the changes or updates are not identified. Furthermore, many obsolete cross-references exist, which suggests that updates have not been carefully integrated into the documents. All of the documentation manuals reviewed lack some components of the Institute of Electrical and Electronics Engineers’s (IEEE) industry-wide standard for documentation. A key component that is missing from all the documents is an index. The copies of the documents that were examined have many publication flaws. Entire chapters are printed upside down; several key chapters have pages missing; several pages are unreadable because of the poor quality of the reproduction. This lack of quality control suggests that low priority is given to the documentation. The documentation of these models is formidable in content, format, and sheer size, but the absence of well-written documentation has probably contributed to the lack of widespread use of or trust in these models. PURPOSES OF DOCUMENTATION AND EVALUATION CRITERIA What constitutes reasonable criteria with which to judge the documentation of microsimulation models? First, it should be recognized that model documentation can serve three fundamental purposes—to provide information about how a model operates, to provide instruction to individuals on how to operate the model, and to serve as a reference. The HITSM documentation is clearly of an informational nature, whereas the TRIM2 documentation and the MATH documentation are mainly reference documents. Second, the intended audiences for various parts of the documentation need to be considered. As described above, there are three parties to the typical microsimulation study transaction—sponsor, analyst, and applications programmer. Correspondingly, informational, instructional, or reference documentation could be targeted to any of these three audiences. Even though particular documents may serve a particular purpose for a specific audience, there are several general criteria that can be used to evaluate a document. These criteria pertain to content and format. With respect to content, documentation should have accuracy, clarity, and completeness. With respect to format, the main objectives should be ease of use and consistency. With respect to the components of documentation, the IEEE has published an industry-wide

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers standard for software documentation. (A later section details this standard and evaluates the microsimulation model documentation with respect to it.) With these criteria in mind, the next three sections review each model’s documentation. Each section provides a summary of the format and purpose of the documentation, a critique of the documentation, and suggestions for improvement. TRIM2 The TRIM2 documentation is comprised of three parts—the Reference Manual (Webb et al., 1986); Simulation Modules (Webb et al., 1982); and Codebook (Bergsman, 1989). The Simulation Modules is a two-volume document, and presumably there are multiple volumes of codebooks, although the panel had only a single document.2 The Reference Manual (Webb et al., 1986:15) itself explains the purpose of the three parts: The TRIM2 Reference Manual is the overall reference for the TRIM2 framework for microsimulation…. Chapter I is an introduction to TRIM2 for anyone who is not acquainted with [it]. Chapter II is a tutorial for anyone who will be using TRIM2 directly or indirectly. Chapter III is a reference for persons actually submitting TRIM2 runs. Chapters VI, X, XII, and XIII and Appendices A and C will be used by programmers adding or modifying simulation modules. The remaining chapters will be used primarily by TRIM2 system programmers. TRIM2 Simulation Modules occupies two or more looseleaf volumes and contains a chapter for each simulation module…. Each chapter contains a text description and technical specification of the module, a description of each subroutine, and definitions and values for all parameters. TRIM2 Codebook contains definitions of all variables used by TRIM2 and a catalog of frequently used household micro files. Critique This section attempts to point out the strengths and weaknesses in the TRIM2 documentation. By a number of standards, the TRIM2 documentation dominates that of the other two models. This model is portable, and the documentation would be useful to individuals who need to learn how to run the model, to analysts, and especially to programmers, who will change and update it. 2   According to the Urban Institute staff, until 1990, there has been only one codebook volume. Each year, the codebook is updated to include new variables, clarifications about variable definitions, marginal statistics, and so on. Because a main goal of the conversion process is to create a standard file each year, it has not been necessary to have different codebooks for different years. In 1990, a new codebook was created because of the major structural differences between the March 1989 CPS and earlier files. The annual codebook update will now be applied to the “March 1989 CPS and later” codebook.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers TRIM2’s Reference Manual suggests that Chapter I was designed for individuals unfamiliar with the model. In fact, Chapter I packs into seven pages of text a discussion about the differences between discrete and continuous simulation models, a discussion of the differences between static and dynamic aging, a claim that TRIM2 could handle microsimulation of corporations, a history of the development of TRIM2, a discussion of the (very) technical advantages of TRIM2 over previous versions, descriptions of all the major simulation modules in TRIM2, a description of the documentation, and the model’s computer requirements. In short, the chapter fails because it is an amalgam of terse discussions of widely different subjects and does not provide an overview of the model for those unfamiliar with it. Perhaps the biggest problem with the TRIM2 documentation is that its updates are not integrated so that the document maintains consistency. Chapter I is a prime example. A government sponsor or other potential user of the model who might focus on this chapter would probably like to know, for example, whether it captures changes in federal income taxes instituted by the 1986 Tax Reform Act, if the JOBS (Job Opportunities and Basic Skills) program is simulated in the AFDC (Aid to Families with Dependent Children) module (which a potential sponsor might have heard was the model’s comparative advantage), or whether recent changes in the food stamp program are simulated. The reader of the chapter would get no clue. Furthermore, the chapter indicates that the current reference codebook for TRIM2 variables is the March 1980 Current Population Survey (CPS) TRIM codebook. Chapter VI presents another example of inconsistent cross-references. Section C of the chapter refers to a document called the TRIM2 Master Routine Manual. The context suggests that this reference, which is repeated several times, should be to the Simulation Modules document. Chapter II of the Reference Manual is actually a tutorial intended to teach the reader exactly how to run the model. The chapter is well organized and very readable. It has many examples of run setups and sample output. This is an excellent chapter that gives the reader the sense that TRIM2 is a portable model. The remainder of the Reference Manual contains technical details aimed primarily at programmers. The subjects covered are job control language (JCL) set-up, the Central TRIM2 Directory, variable naming conventions, programming standards, utilities, and so forth. It was impossible to evaluate the accuracy of these chapters; however, they seem to be clear, consistently formatted, and complete. One bothersome detail about the Reference Manual, however, is that some rather arbitrary decisions seem to have been made about what to include vis-à-vis the Simulation Modules and the Codebook. For example, the documentation for input and output modules, RDFILE and WRFILE, is included in the Reference Manual. However, XPORT, which is an optional type of output, and SIMTAB,

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers which is a tabulation subprogram that produces output reports, are documented in Simulation Modules. Furthermore, the utility programs that TRIM2 uses are documented in a chapter of the Reference Manual, but QUANT and RANDOM, which seem to be utilities, are included in Simulation Modules. The Reference Manual has chapters on databases, conversion of public-use survey microdata files into TRIM2 format, and file aging. However, AINC, a subprogram of the aging modules, is documented in Simulation Modules. Furthermore, MONTHS, FDIMPU, and HOUSE, which seem to be pure imputation routines, also are included in Simulation Modules.3 To their credit, the Urban Institute staff have automated the documentation process in the Central TRIM2 Directory apparatus. This routinization of the process should greatly enhance the availability of and priority placed on documentation. The Simulation Modules document was produced totally by the automated process. The documentation for each module has a short statement of purpose, summaries of subroutines, standard parameters, data parameters, and specifications for input and output variables. A nice feature of the software that produces the documentation is that it produces a table of contents with page numbers. On the other hand, the individual tables of contents that precede the chapters in the Reference Manual lack page numbers, which severely limits their usefulness. A disadvantage of the software is that it limits the graphics that can be used. In particular, flowcharts, example input, and sample printout would enhance many of the descriptions. The Simulation Modules documentation is voluminous; the AFDC module description itself is 283 pages. Nevertheless, the software creates helpful header information on each page. One disappointment is that references to external source documents are somewhat hard to locate (a separate section is recommended) and are usually incomplete. In some instances, references are missing altogether. For example, the AFDC participation algorithm uses a probit function, but the documentation does not cite any source documents. Suggestions The following suggestions are based on this limited review of the documents: The documentation for TRIM2 should be carefully reviewed by a content editor to eliminate inconsistencies such as obsolete cross-references. Also, the editor should point out places where sources need to be identified (completely). Presumably, consistency checking can be implemented in the automated documentation software, so that when portions of the documentation are updated, all relevant references can be updated. 3   It is not clear why FDIMPU is documented separately from FEDTAX or why HOUSE is documented separately from PROPTX.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers The documentation should be carefully reviewed by a copy editor to make the format consistent, to eliminate typographical errors, and to generally improve user friendliness. Better quality control mechanisms should be instituted at the production stage to minimize flaws. Chapter I of the TRIM2 Reference Manual should be rewritten with a nontechnical audience in mind. It should clearly indicate the distinction between databases and simulation modules. A reorganization of the documents should be considered. For example, all database-related documentation such as codebooks, conversion routines, aging, and imputation routines could be collected together in an expanded Codebook. A User’s Reference Manual with the “how to” material from the Reference Manual and the purpose statements and input parameters from the Simulation Modules could be developed. Finally, a more technical Programmer’s Reference Manual could be developed for the system programmers who will be developing new modules or making major changes to existing ones. MATH Like the TRIM2 documentation, the MATH model’s documentation consists of three parts—Technical Description (Doyle et al., 1989), User’s Guide (Social & Scientific Systems, Inc., n.d.), and Codebook (Doyle, 1989). The preface to the Technical Description indicates that the particular document that the panel reviewed was an abbreviated version of another document, also entitled MATH Technical Description (Doyle and Bernhardt, 1983). The version that was reviewed focuses on the routines currently used by the U.S. Department of Agriculture’s Food and Nutrition Service. The documentation does not provide the reader with any key to the purposes of the three parts or how to use them, so one is left with presumptions about their use. Apparently, the Technical Description is intended for analysts and the User’s Guide is intended for programmers. The Codebook provides specifics concerning the underlying data, so it would be useful to analysts and programmers. Critique MATH’s Technical Description is organized by an initial introductory chapter intended for a general nontechnical audience, followed by major sections on initial data processing, aging routines, taxes and transfers, and utilities. Descriptions of several routines comprise each of the major sections. The table of contents for the document lists the order of the material in the book using routine numbers (not page numbers!). While the order of the material is, of course, useful, the reader is given no clue as to the purpose or meaning of the routine numbers. For the most part, they are three-digit numbers between

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers 021 and 506, but some numbers have letters attached, one is “H-1,” and some routines do not have a number. Figure 3 of the introduction (the document has no list of tables or list of figures, and Figure 3 has no page number) provides some clue about the different routines and how they fit into the model, but there are many routines in the table of contents that are not included in the figure and some routines in the figure that are not included in the table of contents. The Introduction to the Technical Description is a much better nontechnical description of microsimulation and MATH than the similar chapter in TRIM2’s Reference Manual. It gives a clear distinction between the underlying database, its preparation, and the simulation routines. It also includes a flowchart that greatly enhances the text. In both the TRIM2 introductory chapter and here, however, it is bothersome to encounter marketing text (touting the power and advantages of the model). Presumably, by the time the reader encounters the technical documentation, he or she is already sold on the product. The rest of the Technical Description consists of technical specifications for the model’s various routines. Again, it was difficult to evaluate the accuracy of the specifications. Furthermore, a reader is given almost no help in navigating the document. Consequently, reviewing the document from front to back leads to a sensation of disconnectedness. It is like looking at the individual pieces of a jigsaw puzzle with no guidance as to how the pieces fit together. Some observations on the Technical Description: The format of the individual sections is inconsistent. Often, the individual chapters have introductions or statements of purpose, but sometimes these are in section I of a chapter and sometimes they are in section A, B, or C of particular subsections of chapters. As with TRIM2, production of MATH’s Technical Description appears to have lacked quality control. For example, the chapter on the demographic aging routine (#208) has an interesting introduction that explains much of the theory and empirical work on which the routine is based. However, upon first reading the description, a reader discovers that page 2 of the introduction has apparently not been copied, but rather page 1 is followed by page 4. Later, it is discovered that page 2 is copied onto the reverse side of page 26.5. As another example, in the specifications for the routine to convert the March CPS to a MATH database (#021), the text refers to a list in Table 1. There are two lengthy tables in the chapter labeled Table 1a and 1b, and there is another table, much later in the chapter, labeled Table 1. Finally, in the specifications for the alternative demographic aging routine (#202),4 reference is made to a Table 1, but no table exists in this section. 4   The text does not explain why a user would be interested in using this alternative given that there is another demographic aging routine (#208).

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers The contents are often very poorly written. To give the reader a feel for what it is like to encounter this documentation, Figure 1 is a table from the description of the federal income tax simulation routine (#306). The documentation text refers to Table 6, whereas the table is labeled 6A. There is no Table 6 or 6B, for that matter. The title of the table appears to be missing a word or punctuation mark after the word “DEDUCTIONS.” The title uses the term “SELECTED YEARS PRIOR TO…1986.” However, no years are given in the table, and the data in the table seem to have no relationship to time. A phrase at the bottom of the table, which is presumably a general note, indicates that the table entries are scaled in thousands. In fact, whereas columns 2 and 4 are scaled in thousands, columns 3 and 5 are scaled in millions, and columns 1, 6, and 7 are not scaled. These errors are typical of the document. The MATH Technical Description is a very lengthy and complex document (over 300 pages). The complexity and length seem unnecessary, however. For example, in the routine to impute a health disability variable (#106), several pages describe imputations that the document indicates are obsolete. The documentation updating procedure should eliminate such descriptions rather than tell the reader that they are obsolete. Furthermore, the imputations are simple recodes of two or three existing variables. It would be easy to use a single page rather than 11 pages to document the imputations. Also, it seems oblique to use an entire page for a “generalized marital status definer” that indicates that the model considers an observation to be married if the underlying data report that the individual is “married, spouse present.” Finally, the degree of referencing to external data sources and documents is highly uneven. When references are given, though, they seem to be more complete than those in the TRIM2 documentation. The description of the simulation of federal income taxes and the simulation of the participation algorithms in AFDC, supplemental security income (SSI), and food stamps have very few references. On the other hand, the benefits and eligibility parameters in the AFDC and SSI simulations have good documentation. The writers of the MATH User’s Guide had a very particular meaning in mind for the word “user.” This document seems to be useful only to programmers who are debugging model errors or making changes to the model. It is much too technical for an analyst or a research assistant with minimal programming knowledge who might be charged with applying the model. There is no introductory material to explain how the model processes data, the types of parameters, or the JCL set-ups, for example. Because of its highly technical content, the panel was again forced to focus on format clarity rather than accuracy. The MATH User’s Guide has many advantages over the Technical Description in this respect. The format of the individual sections is consistent (at least for the sections identified as Release 89.1). Each section has a brief statement of purpose (including a

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers FIGURE 1 Sample table with numerous errors from the MATH Technical Description.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers flowchart of processing) and sections on inputs, outputs, printouts (including error messages), examples, constraints, detailed method, and common blocks. The text has a number of examples and sample inputs and printouts as well as many flowcharts. Each section has a table of contents (with page numbers) and a list of tables and figures. However, despite these features, the User’s Guide has a number of deficiencies. First, the document that the panel reviewed has virtually no preliminary material—no title page, warranty, preface, introduction, or author or contact information. There is a table of contents with page numbers; however, the pagination of the document seems to use some type of code—a combination of capital Roman numerals, capital letters, lowercase Roman numerals, Arabic numbers, and punctuation—with no explanation. The first page of text is V.A.-1. Later in the document, the only text encountered on page VI.A.-1 says that “pages VI.A.-1 through VI.A.-47 have been removed.” The next page is VI.A.-48. It is also necessary to decipher the archiving/updating system of the document. Most pages are identified by a release number and a date. Approximately two-thirds of the document is identified as Release 89.1 (dated 06/01/89). Presumably, releases occur sporadically. The 83.2 release is dated 7/1/83, whereas 80.1 is dated 9/1/80. In short, presumably if one is on the “inside” team of programmers, this document can serve as a reference document. For others it has no instructional or reference value. Thus, it seems to have extremely limited value to anyone (other than the original model developers) attempting to use the model. The MATH Codebook document provides the precise specifications for all of the variables used in or created by the MATH model. The preface to the document indicates that the particular version reviewed by the panel is an abbreviated version of another document entitled the MATH Codebook. Except for 4 pages of general notes, the document consists of an automated codebook. It is very similar to the TRIM2 Codebook document, as would be expected, although the MATH Codebook seems to have more information about each variable (such as survey, source, and universe) and to be more readable. The increased readability likely stems from the columnar format. Suggestions As with the TRIM2 documentation, the MATH documentation could benefit from critical copy and content editing. The editing should take into account the fact that the documents need to be useful to readers who are not thoroughly familiar with the MATH model. Also, the production of the documentation could use more quality control. The MATH Technical Description would benefit from expanding its introduction (perhaps in the form of an additional chapter) to provide instruction

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers or a tutorial on how to run the model for parametric applications. Analysts are sufficiently computer literate that they could learn something about how the model operates by seeing sample run set-ups and printouts, even if they will never apply the model themselves. Furthermore, the instructional material should relate the Technical Description to the User’s Guide. Besides an instructional chapter, the Technical Description would benefit from mechanisms to make it more user friendly. A table of contents for each section should be added. Obsolete material should be deleted. The use of formulas with unexplained mnemonics for very simple recodes of variables should be eliminated. Lengthy tables of parameters could be put in appendixes. The MATH User’s Guide, on the other hand, is fairly user friendly except for its virtual omission of preliminary materials and pagination. An introduction should be added that explains the organization of the document, how it relates to the MATH Technical Description, the archiving scheme, the conventions used in the text, and so forth. HITSM The HITSM documentation (Lewin/ICF, Inc., 1988) and, if the documentation accurately reflects the model, the model are quite distinct from TRIM2 and MATH. The HITSM documentation consists of a single report that documents a study in which a reasonably broad measure of household disposable income was estimated for all the observations on a microdata file. The report has chapters on the creation of the database, aging the database, imputation of public assistance, and imputation of taxes. On the basis of this document, one could easily quibble about whether HITSM is a model. The document never discusses user parameters, inputs, options, or outputs. Rather the style of the document is to describe how Lewin/ICF, Inc., estimated disposable income on a microdata basis and to validate the estimates by comparing aggregated microdata to outside sources. In fact, the document often says “this study” or “this analysis” instead of “the model” (see pp. II-3, II-15, II-38, and III-5). Critique On the positive side, the document does attempt to address the accuracy of the model’s output. In the documentation for MATH and TRIM2, validation or model accuracy is never addressed. Also, the HITSM documentation provides many references to external data sources or studies and thus provides the reader with a sense that ICF has carefully grounded its work in prior literature. On the negative side, the document is comparable to the other documents the panel reviewed in terms of typographical errors, inconsistent formatting, and lack of clarity. Particularly disturbing about the HITSM document is that factual errors are present in the content Page II-2 states: “The CPS is a

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers nationwide sample survey of over 60,000 households conducted during selected months by the Bureau of the Census…. The CPS is designed for income analyses.” In fact, the CPS is conducted monthly and is designed to derive estimates of unemployment. Page II-4 has a footnote that states: “Statistical matching is a variation on the data collection procedures used by the Bureau of the Census and other survey research organizations called ‘hot decking.’” In fact, statistical matching is a data imputation procedure, and its relationship to hot decking is arguable. A footnote on p. II-5 implies that subfamilies are composed of single-parent daughters of family heads when, in fact, subfamilies can be intact and can be single-parent males. These types of errors begin to erode the reader’s confidence in the model because it appears that Lewin/ICF, Inc., did not have a complete understanding of the underlying data. Passages such as the following further erode the reader’s confidence (Lewin/ICF, Inc., 1988: IV-88): The average LIHEAP benefit amounts reported in the March 1987 CPS varied substantially across income, fuel type, household size, and Census region groups. However, due to limited sample size, some of the estimated average benefits varied more than we felt reasonable. To reduce this apparently spurious variation, we estimated a regression of reported LIHEAP benefits as a function of income, household size, fuel type, and Census region. We then solved the estimated regression model to obtain estimates of average benefits for each income/fuel type/household size/Census region group as shown in Table 38. In the HITSM simulations, eligible individuals were assigned the average benefit reported in the CPS for households of similar characteristics using the data in Table 38. Although it is not clear what is meant by “solved the estimated regression model,” this passage seems to indicate that Lewin/ICF did not believe the reported data because of excessive variation, so Lewin/ICF estimated a model with discrete independent variables and assigned predicted means to all observations. Suggestions Clearly, many of the problems with the HITSM documentation may simply reflect poor technical writing and may not be indicative of the quality of the model. However, Lewin/ICF needs to understand that in microsimulation, where the particular policies or scenarios being analyzed may never occur, the audience for the projections and analyses must rely on the communicated result to formulate judgments about a model. As a technical report, the HITSM documentation would be passable after careful editing and production. As far as providing model documentation to outside users of the model, the document the panel reviewed is inadequate.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers COMPARISONS WITH IEEE STANDARDS The last step in this review was an evaluation of the microsimulation model documentation using the content requirements for model documentation from IEEE Standards for Software User Documents (IEEE, 1988). Table 1, taken from the standard, summarizes the inclusion requirements, which are discussed below. • Title page A title page is mandatory and must include (1) the document’s name, (2) its version and date, (3) the software covered, and (4) the issuing organization. All but one of the documents reviewed has a title page (the exception was the MATH User’s Guide). The HITSM document conforms to the standard. The MATH and TRIM2 documents that have title pages do not reference a document version but otherwise generally conform to the standard. • Restrictions When restrictions apply to using or copying a document or software product, their identification is mandatory on the title page or immediately following the title page. None of the documents reviewed has restrictions. • Warranties and contractual obligations Warranties, contractual obligations, or disclaimers should be in a separate section in the documentation or there should be a reference to where the information can be found. None of the documents reviewed has either. • Table of contents A table of contents is necessary in documents over 8 pages long. The TRIM2 Reference Manual has a simple table of contents at the beginning as well as comprehensive ones for each major section. All of them lack page numbers. The TRIM2 Simulation Modules has a simple table of contents at the beginning of the first and second volumes (unnumbered) and a comprehensive one at the beginning of each section. The TRIM2 Codebook has a comprehensive table of contents at the beginning of the document. The MATH Technical Description has a simple table of contents at the beginning of the document but none preceding major sections. The MATH User’s Guide has a simple table of contents at the beginning of the document (the pagination system was indecipherable) and a comprehensive table of contents at the beginning of each section. The MATH Codebook has no table of contents. HITSM has a comprehensive table of contents at the beginning of the document. • List of illustrations A list of the titles and locations of all illustrations contained in a document is optional. The TRIM2 Reference Manual has such a list preceding each major section. Neither the TRIM2 Simulation Modules nor the TRIM2 Codebook has a list of illustrations. The MATH User’s Guide has lists in each major section, but neither the MATH Technical Description nor the MATH Codebook has lists. The HITSM documentation does not have a list of illustrations either. • Introduction The IEEE standard requires the following in the introduction: audience description, applicability statement, statement of purpose,

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers TABLE 1 IEEE Standard Inclusion Requirements Component Single-Volume Document Multi-Volume Document 8 Pages or Less More Than 8 Pages First Volume Other Volume Title page M M M M Restrictions M M M M Warranties R R R R Table of contents O M M M List of illustrations O O O O Introduction   Audience description R M M R Applicability M M M M Purpose R M M R Document usage R M M R Related documents R R R* R Conventions M M M R Problem reporting R M M R Body   Instruction mode 1 1 1 1 Reference mode 1 1 1 1 Error conditions R R R R Appendixes O O O O Bibliography M M M** M** Glossary M M M** M** Index 2 2 M** M** Key: M Mandatory-Shall be included when information exists. O Optional. R Reference—Either include the section or a reference to where the information can be found within the document set. * Shall address relationship to other volumes. ** Mandatory in at least one volume in the document set, with references to information in other volumes. 1 Every document has a body, each document set shall address the instructional and reference needs of the audience. Required content is in 5.7.1 and 5.7.2. 2 An index is mandatory for documents of 40 pages or more and optional when under 40 pages. SOURCE: Institute of Electrical and Electronics Engineers, Inc. (1988: Table 2).

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers document usage description, related documents, conventions, and problem reporting instructions. The audience description indicates what parts of the documentation are intended for which audiences and the experience levels of the audiences. The applicability statement indicates the version of the software that is documented and the required hardware and software environments. The conventions section summarizes the symbols, stylistic conventions, and command syntax conventions used in the document. The other sections of the introduction are self-explanatory. All three sets of documentation for TRIM2, MATH, and HITSM have introductory sections that are reasonably good statements of purpose. In addition, the introduction to the TRIM2 Reference Manual relates the manual to the other documents in the set and briefly indicates the intended audiences. The introductions in both the TRIM2 Reference Manual and the MATH Technical Description discuss hardware and software environments. The MATH User’s Guide has no introduction. The HITSM documentation is a single volume, so some of the parts of the required introduction would not be relevant; it does not include an audience description or applicability statement. The preface in the TRIM2 Simulation Modules has excellent instructions for reporting problems, and it solicits comments and suggestions from readers. None of the other documents pays attention to this matter. • Body of document The IEEE standard distinguishes between instruction mode and reference mode documents. Instructional documents are intended to provide the reader with the information necessary to operate the software. Reference documents are intended to provide the reader with easy access and random access to information about all aspects of the software. The standard requires slightly different contents for each mode. But it exhorts the producers of the documentation that “in either mode, use a consistent organizational structure based on the expected use of the document, providing examples as necessary” (IEEE, 1988:10). The standard suggests the following contents for these two modes of documentation: Instructional Mode Reference Mode Scope Materials Preparations Cautions and warnings Methods Related information Purpose Materials Preparations Input(s) Cautions and warnings Invocation Suspension of operations Termination of operations Output(s) Error conditions Related information

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers In the IEEE standard the instructional mode document may be thought of as a tutorial. The scope is simply a statement at the beginning of the document indicating the material that is to be covered. The materials and preparations sections indicate items the user will need to complete the tutorial. Cautions and warnings help learners avoid major problems that may result from mistakes in using the tutorial. The methods section is really the body of the instructional document and describes what the learner must do, how to invoke functions, possible errors, and expected results. Finally, related information provides other useful information to the user, such as tasks frequently performed together or constraints and limitations. Chapter II of the TRIM2 Reference Manual is the only part of all of the documentation reviewed that could be considered to be written in an instructional mode. Most of the IEEE’s required components can be found in this chapter, albeit not in the specific order listed above. If there is a deficiency, it is in not providing explicit cautions and warnings. For example, the chapter warns against excessive numbers of household dumps or excessive numbers of observations to be processed in a test or homework run, without telling the reader how many are too many. The body of a reference mode document contains material about the model’s software. Each section begins with a statement of purpose. The materials and preparations sections concern items needed to operate the particular function of the software. The inputs section describes all data required to process the function correctly. Cautions and warnings refer to unintended consequences from applying the function being described. The invocation section provides all the information needed to use and control the function. The sections on suspension and termination of operations describe how to interrupt and how to recognize normal and abnormal terminations. The outputs section discusses the outputs to be expected from executing the function, including hard copy or screen displays and changes to files or data. Error conditions refer to common errors that could occur while executing the function. Related information would include limitations and constraints, related functions, and notes. Most of the chapters in the TRIM2 Reference Manual, TRIM2 Simulation Modules, MATH Technical Description, MATH User’s Guide, and HITSM documentation are reference mode documents that refer to major sections of the respective models. In general, the chapters are complete in their descriptions of purpose, inputs, and outputs. With the exception of HITSM, they also satisfactorily describe invocation information. If these chapters have a weakness relative to the IEEE standard, it is their lack of cautions and warnings and typical lack of discussion of error conditions. • Error messages, known problems, and error recovery The IEEE standard suggests that this section must be included in a documentation set and reference to this must be provided within each volume of the set None of the documents reviewed has such a section or reference.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers • Appendixes The IEEE standards suggest that optional appendixes are be useful for (1) detailed input and output data, (2) “codes” used in input or output, (3) interactions between tasks or functions, (4) global processing limits, (5) descriptions of data formats and file structures, and (6) sample files, reports, or subprograms. The documentation for all three models has appendixes that provide descriptions of data formats. In HITSM there is an explicit appendix. In TRIM2 and MATH it is a separate Codebook document. Apart from these, all of the documents suffer from having excessively long input lists (e.g., program parameters that vary by state and by year) in the middle of the text. Such input lists should appear in appendixes. • Bibliography A bibliography is mandatory according to the IEEE standard. The HITSM document does not have a bibliography. The MATH Technical Description has a bibliography for the introductory chapter but none for the rest of the document. The TRIM2 CPS Codebook document has a bibliography, but the Reference Manual does not. Some of the individual sections in the Simulation Modules binders have reference lists at the end of the “Purpose” or text section (see, e.g., the FSTAMP chapter), but the Simulation Modules binder lacks a single bibliography or a standard reference list or bibliography for each chapter. • Glossary The IEEE standard requires a glossary for all documentation sets; however, none of the documents reviewed has one. • Index As with a glossary, an index is required for all documents. Again, none of the documents reviewed has one. (The MATH Codebook has two sections near the front, entitled “alphabetical variable index” and “sequential variable index;” however, neither functions as an index in the sense of the IEEE standard.) In short, the documents reviewed had many of the components required of good documentation. They were particularly compliant with respect to the major items—statements of purpose, inputs, methods, and so forth. The components that were lacking, however, were precisely those that would allow a user to use the documents easily. Users of software documentation know how crucial an index is, and yet none of the microsimulation model documentation reviewed here had one. Before these models can be widely used, a major emphasis must be put on organizing and editing the documentation. The current documentation impedes, rather than facilitates, acceptance of these models.

OCR for page 331
Improving Information for Social Policy Decisions—The Uses of Microsimulation Modeling: Volume II, Technical Papers REFERENCES Bergsman, A. 1989 TRIM2 CPS Codebook. Project Report 3826–01. Washington, D.C.: The Urban Institute Press. Doyle, P., ed. 1989 The MATH Codebook. Washington, D.C.: Mathematica Policy Research, Inc., and Social & Scientific Systems, Inc. Doyle, P., and Bernhardt, J. 1983 The MATH Technical Description. Washington, D.C.: Mathematica Policy Research, Inc. Doyle, P., Trippe, C., Huff, A., and Coffin, K., eds. 1989 MATH Technical Description: Current Services Files. Washington, D.C.: Mathematica Policy Research, Inc., and Social & Scientific Systems, Inc. Institute of Electrical and Electronics Engineers, Inc. (IEEE) 1988 IEEE Standard for Software User Documentation. New York: Institute of Electrical and Electronics Engineers, Inc. Lewin/ICF, Inc. 1988 The Household Income and Tax Simulation Model (HITSM): Methodology and Documentation. Washington, D.C.: Lewin/ICF, Inc. Social & Scientific Systems, Inc. No date MATH User’s Guide. Bethesda, Md.: Social & Scientific Systems, Inc. Webb, R., Hager, C., Murray, D., and Simon, E. 1982 TRIM2 Simulation Modules. Working paper 3069–02. 2 vols. March 1982 plus updates. Washington, D.C.: The Urban Institute Press. Webb, R., Bergsman, A., Hager, C., Murray, D., and Simon, E. 1986 TRIM2 Reference Manual: The Framework for Microsimulation. Working Paper 3069–01. Washington, D.C.: the Urban Institute Press.