Computing Technology and Microsimulation
Microsimulation models for social welfare policies depend on the use of digital computers to process the input data and generate the simulation results. Even the simplest microsimulation models (and most such models are large and complex) require significant computing power because they process large microlevel databases, mimic the features of complex government programs, and apply probabilistic techniques to simulate the behavior of individual decision units. The historical development of microsimulation models as a policy analysis tool is closely intertwined with the expansion of computing technology over the past 30 years.
The major microsimulation models in widespread use today are designed for computing environments that are rapidly becoming obsolete and that impose significant costs for use of and access to the models. Although computing costs per simulation run have been greatly reduced with the hardware and software platforms used by the models in the 1980s in comparison with those used by the models in the 1960s and 1970s, other costs, such as those for staff and for modifying models to respond to new policy initiatives, remain high.
In this chapter, we briefly review the history and characteristics of the computing environments for two widely used models, DYNASIM2 and TRIM2. We then consider the potential of new hardware and software technologies for improving microsimulation models. There are exciting developments on the horizon that offer the prospect of building a new generation of microsimulation models that are much more flexible and accessible to policy analysts. Of course, implementing new computer technology is never easy or without pitfalls. There
are many examples of spectacular failures—involving large sums of money—from technological initiatives in both the public and the private sectors. We recommend a strategy that we hope will minimize the risks and maximize the payoffs from investment by policy analysis agencies in new computing platforms for microsimulation modeling.
THE EVOLUTION OF MICROSIMULATION COMPUTING PLATFORMS
In the late 1960s, Guy Orcutt and his colleagues obtained funding to begin work on a full-scale dynamic microsimulation model at the Urban Institute, building on his pioneering work to develop dynamic microsimulation techniques for research and policy analysis in the late 1950s.1 The first version of DYNASIM, completed in 1975, represented an ambitious effort to simulate all major demographic and economic life events, including birth, death, marriage, remarriage, unemployment, and migration. DYNASIM was intended not only for policy analysis, but also as a general social science research tool.
DYNASIM's computer software, the Microanalytic Simulation of Households (MASH) system, written in FORTRAN for a DEC system-10, qualified as state of the art. The software system was interactive and provided an extensive command language structured to allow researchers to work directly with the model, its database, and each other. The MASH system included an on-line attribute library, a machine-readable codebook, and on-line dictionaries for each system user. The population data for the model were stored in virtual memory rather than on magnetic tape, allowing nonsequential direct access to each person and family in the file.
Yet the system had serious shortcomings for policy analysis and research. The virtual memory simulator and its disk space requirements could not handle large sample sizes. Moreover, the cost and elapsed time required to process a large number of persons in an interactive time-sharing environment was excessive: it could take from 4 to 12 hours to complete a single year's projection.
In the late 1970s the Urban Institute obtained funding to write a new version of the model, DYNASIM2, which was designed with a narrower focus. DYNASIM2 includes elements of the original DYNASIM model, the PENSIM model developed by James Schulz to simulate private pension alternatives, and other features specifically geared to simulating long-range forecasts of earnings and family histories needed for analysis of retirement income policy issues. The
software system for DYNASIM2 is much simpler than that for DYNASIM: it is also written in FORTRAN, but processes the population data sequentially in a binary integer format and is optimized for batch processing with a mainframe (or large mini) computer. The software includes some of the original features of the MASH system, such as an on-line time-series dictionary and a machine-readable codebook. It has a simplified command structure for the user.
The DYNASIM2 software can process large data files much more inexpensively than its predecessor. Processing costs increase during simulation runs, because DYNASIM2 adds persons to the database and records new data elements for each person in each simulation year, but the costs do not become prohibitive. For example, the number of CPU (central processing unit) minutes for a run of the family and employment history module increases by about 50 percent from projections in the early 1990s to those in 2030. DYNASIM2, in contrast to the original DYNASIM, has also proved to be fairly portable across types of computers.
TRIM2 is the latest version of a cumulative, 20-year effort to model accurately and efficiently income support and taxation systems in the United States. 2 RIM (Reforms in Income Maintenance), the first model in this lineage, was designed and developed in 1969 by President Johnson's Commission on Income Maintenance. RIM was programmed in FORTRAN for CDC (Control Data Corporation) computers. It lacked any overall architecture or framework, and the virtual absence of documentation made it difficult to use and modify. The Urban Institute attempted to build a more useful system by modifying RIM, but had to abandon the effort. Instead, the Institute built an entirely new system called TRIM.3
TRIM was programmed in FORTRAN, with some use of assembly language, for running on an IBM mainframe computer. The software system was designed to enable TRIM to
run with large microlevel databases that could support reporting simulation results for very disaggregated groups;
simulate a wide variety of tax and income support programs through a modular structure;
simulate programs that use different filing units in the same model run;
model alternative versions of a single program or different program interactions in a single model run;
support flexible specification of program characteristics to be simulated;
facilitate making changes to the computer code to simulate new programs or features of programs that could not be modeled simply by setting one or more existing parameters;
execute runs in a timely fashion; and
provide adequate documentation for programmers, analysts, and users.
These characteristics were valuable, but the heavy demands placed on TRIM for policy analysis in the mid-1970s left little time for following good practice with regard to structured programming and adequate documentation, with the result that TRIM increasingly came to be viewed as difficult to understand and use. Hence, when ASPE needed to produce estimates for the Carter administration's proposed Better Jobs and Income Program, the agency decided to develop a special-purpose in-house model that could simulate a linked public jobs and cash strategy, rather than asking the Urban Institute to make the necessary modifications to TRIM. The resulting KGB model was developed in approximately five weeks by using data files derived from TRIM and simplified tax and transfer program operating characteristics. Although it was used heavily for several years, KGB again exhibited the worst features of RIM. It lacked a generalized software framework, its operating efficiency was just barely adequate, and the system could be used only by its creators.
In the late 1970s the Urban Institute received funding to redesign TRIM to improve overall efficiency, improve the printed output, and develop new data structures and processing procedures in order to make the system easier to use by analysts and easier to develop and modify by programmers. The old TRIM software is still used to preprocess each year's March CPS file. The new TRIM2 software, also written in FORTRAN for an IBM mainframe computer, then takes over to complete the processing of the input data and to run program simulations.4
NEW DEVELOPMENTS IN COMPUTING TECHNOLOGY
Both DYNASIM2 and TRIM2—as well as other major microsimulation models that are widely used today such as MATH—are optimized for batch processing with mainframe technology (or, as for DYNASIM2, a large minicomputer). Their software system developers have focused on enhancing processing efficiency and reducing computer costs through such techniques as data compression, variable transposition, and overlaying run phases in memory. They have also assigned high priority to techniques for maintaining control over the simulation process. For example, all parameter definitions and their values are predefined in the central TRIM2 directory.
But, the effort to achieve processing efficiency and centralized control has resulted in less than optimum performance on other dimensions. Total turnaround time can be slow. For example, only two or three TRIM2 production jobs can generally be run during a day, and, indeed, large jobs are typically run overnight both to take advantage of cost discounts and because the scheduling algorithm of the computer center would be unlikely to allow them to be run during the day. Although the software redesign for models such as TRIM2 and DYNASIM2 made them easier to use than their predecessors, analysts, whether on the staff of the sponsor agency or the modeling contractor, still almost always rely on programmers to prepare runs for submission. This is true for runs that involve simply setting parameter switches as well as runs that involve modifying the model code to handle new program options. The investment in learning how to work with the models directly is prohibitive for most analysts.
Just about 10 years ago, a new technology based on individual desktop personal computing burst on the scene.5 Although actual implementation has lagged somewhat behind the promises and perception, it is arguable that the widespread adoption of microcomputer technology has revolutionized the way in which people engaged in basic or applied research and development carry out their work. Analysts have been able to interact in real time with their data and analysis systems without leaving their desks and without the need for programmers or systems analysts to serve as intermediaries. Linking individual microcomputers through local and remote electronic networks has preserved flexibility for individual researchers while enabling groups of researchers to work collaboratively and share ideas and information.
Some agencies and researchers have already begun making use of microcomputers as platforms for microsimulation modeling. The CORSIM (Cornell Simulation) model, developed by Steven Caldwell, is a dynamic model based on DYNASIM2 written in the C language that is currently being used on an IBM mainframe and on MS-DOS microcomputers. In a related development, Mathematica Policy Research has developed a PC-based benefit-calculator model of the food stamp program for use by the Food and Nutrition Service (FNS), while the Urban Institute has developed a PC-based benefit-calculator model of the AFDC program for use by ASPE. Statistics Canada several years ago instituted a major microsimulation modeling effort that led to the Social Policy Simulation Database/Model (SPSD/M)—a static model of Canadian household tax and transfer programs, written in C for an MS-DOS microcomputer, that is available to the general public (Statistics Canada, 1989).
For policy analysis generally, microcomputing technology has come into widespread use. Policy analysts have become accustomed to placing heavy reliance on personal computer hardware and such software as spreadsheets to develop ad hoc, special-purpose models to respond to particular policy issues. As described in Chapter 2, CBO and ASPE analysts relied heavily on personal computer-based ad hoc models to generate many of the estimates for the debate on the Family Support Act (FSA). Analysts value highly the flexibility to develop their own models and the facility that personal computer-based models afford them for timely response to changing policy issues. In contrast, some of the changes required to enable TRIM2 to handle particular features of the FSA could not be implemented in time to provide input to the policy debate.
Many aspects of the computer implementation of microsimulation models could usefully be reviewed to determine approaches that might improve the models' cost-effectiveness. For example, the capabilities for data management, retrieval, and documentation afforded by relational database management systems (see David, 1991) may have applicability to microsimulation models. Similarly, high-level languages developed especially for simulation may have application to microsimulation. Our review of computing issues for microsimulation, however, has focused on the potential for innovations in microcomputer-based hardware and software to improve the next generation of microsimulation models.
We believe that microcomputer platforms have the potential to enable microsimulation models to provide more flexible, timely responses to policy questions; to facilitate validation of models and their outputs; and, generally, to make the models more accessible to users. Although microsimulation models (along with other formal models) offer important benefits to the policy process, we fear that they will not be competitive with special-purpose ad hoc models for many policy needs so long as they remain relatively inaccessible to the analyst. For the same reason, we fear that microsimulation models will remain less well utilized and studied by academic researchers and, hence, that there will be less useful feedback for the improvement of policy models (see Chapter 11).
Alternative Computing Platforms: SPSD/M Versus TRIM2
The panel commissioned a comparative evaluation of a microcomputer-based tax and transfer simulation model, SPSD/M, and a mainframe-based model, TRIM2 (reported in Cotton and Sadowsky, in Volume II). This study was not intended to be a representative sample but rather to give a concrete comparison of major options. The study found advantages and disadvantages to both models' design features and computing platforms; however, the clear weight of the evaluation was in favor of the microcomputer-based platform of SPSD/M:
The interactive SPSD/M system permits users to specify interactively what a model run is to do and to interrupt the execution of the model to check intermediate results, a capability that the TRIM2 system does not support.
An analyst or programmer would probably be able to complete a research project at lower cost and in less elapsed time using SPSD/M rather than TRIM2, because of the interactive features in SPSD/M and its reasonably efficient processing (about 20 minutes of wall-clock time on a Compaq Deskpro 386/20 to run a complete simulation) coupled with the ability to run at any and all times of the day. In contrast, production runs for TRIM2, although requiring only 5 to 10 minutes of IBM System 3090 CPU time, must typically wait to be scheduled during the day or run overnight.6
The C language in which SPSD/M is written permits it to be moved to the next generation of personal workstations: for example, the current system could be ported to a UNIX workstation with less than 1 week of development effort. In contrast, moving TRIM2 away from its IBM MVS environment would involve a major software development project. Also, the SPSD/M user interface could be adapted relatively easily to a graphic user interface; however, such an adaptation is not even appropriate for the TRIM2 mainframe environment.
SPSD/M is attractive to new users because of its ability to interface with other MS-DOS software packages, such as Lotus 1-2-3 and PC SAS. TRIM2 can produce SAS input code, and there has been some experience in downloading TRIM2 data to a personal computer for use in Lotus 1-2-3; however, the interface with personal computer software packages is inconvenient to carry out.
The SPSD/M user community has grown as more individuals and organizations became interested in using the model to participate in the tax reform debate in Canada. Most of the new SPSD/M users are using the model in the "black box" mode, that is, a mode in which they do not seek to understand or modify the program code. These users receive no formal training from Statistics Canada, but they appear to be using the model successfully with the SPSD/M documentation and the model's interactive mode of operation. Several outside researchers and consultants are modifying the model's internal structure (i.e., they are using the model in the "glass box" mode) with minimal assistance from Statistics Canada. TRIM2, in contrast, is currently used almost exclusively by the technical programming staff at the Urban Institute, and no outside researchers are attempting to enhance the model by modifying the internal FORTRAN source code.
New Computing Capabilities Versus Microsimulation Requirements
Computer systems for microsimulation models must effectively perform several key functions, including: storing the microdata in an efficient and cost-effective manner, managing the database and associated documentation, ensuring logical consistency among individual modules and operating characteristics within the model, providing a means to link modules for particular applications, and providing an effective user interface. Good microcomputer-based solutions do not yet exist for supporting all of the necessary functions; however, current developments appear to promise considerably more effective computing environments than are now used for microsimulation modeling activities.7
The computing platform must be able to run large and complex models. By 1995, desktop workstations should be available with the capability to support microsimulation models of considerable size. Single processors are expected to range in speed between 40 and 100 millions of instructions per second (MIPS); hence, the speed of executing microsimulation model runs will no longer be a significant issue. To the extent that parallel processor architecture becomes available for workstations in such a way that tasks can be easily divided to run in parallel, the increase in speed will be multiples of what a single processor will be able to achieve.
For efficient processing, mainframe microsimulation models have compressed the input data in various ways. The desktop workstation environments of 1995 are likely also to require some type of compression. However, primary RAM (random access) memories of 32-64 megabytes are expected to be common, so that compressed files of the size currently processed by TRIM2 may well fit within primary memory during the course of a simulation. With respect to secondary storage devices (e.g., hard disks), current developments indicate that secondary storage on workstations in 1995 will be at least as good as secondary storage on current medium-size mainframes.
The history of electronic computing has been characterized by hardware developments leading software developments. However, the workstation software environment in 1995 is likely to feature several elements of importance to microsimulation modeling activities. For example, powerful graphical user interfaces should greatly facilitate direct experimentation by an analyst in simulating alternative policy proposals, extending model capabilities, and performing sensitivity analyses and estimates of variance. These interfaces are characterized by icons, windows, and the use of "point and click" tools that enable users to work more effectively and easily with complex models and data. Similarly,
substantial advances in computer-assisted software engineering (CASE) tools—which are almost always based on a graphical user interface and embody the notion of a construction kit approach to model design—should magnify the productivity of software designers as well as users.8
In sum, there appears to be no reason that applying microcomputer technology to the needs of microsimulation modeling on a large scale should not be feasible in the near future and should not bring the same kinds of important benefits that such technology has brought in other contexts to researchers and analysts. Chief among these benefits is the ability for users, as well as programmers, to interact directly with models and data in a manner that encourages experimentation and use. Arguments against moving to microcomputer technology because of the limited storage capacity of microcomputers (both immediate access and secondary storage) and slow processing times are rapidly losing relevance with the pace of change in the microcomputer world.
Another argument questions the utility for policy analysis of encouraging many people to engage in modifying and using models and thereby generating possibly inconsistent or erroneous estimates. We believe that this danger can be minimized by building extensive documentation features into new models. Moreover, we believe that the benefits from broader access to models equipped with the capabilities for ready evaluation of model estimates and ready adaptation to new policy needs outweigh the risks from a proliferation of estimates.
Although we have focused our discussion on microcomputer technology (specifically, powerful workstations), we are mindful that such platforms do not represent the only possible future for microsimulation models. The pace of change in hardware and software technology is not only rapid, but also multidirectional. It may well be that some other type of platform would be feasible and, indeed, optimal for the next generation of microsimulation models. For example, one could envision a computing environment that used graphical software tools operating on a workstation to develop and test model applications, coupled with links to a supercomputer for making production runs using large samples. Such an environment might facilitate experimentation and direct access by an analyst for many applications and also permit systems staff to run large, complex applications involving a range of capabilities, such as behavioral response or links to other models. Such an environment could also make it possible to keep track of variants of the model, thereby minimizing the problems from multiple estimates.
In our view, the overriding consideration is not the specific hardware or software technology that is adopted, but the need to design a new generation
of microsimulation models that meet the design criteria we have identified (see Chapter 6). As we recommend, new models need to provide enhanced flexibility; enhanced accessibility; the ability to generate clear and complete documentation; the ability to evaluate model components as well as the overall model; and, finally, acceptable cost and time for development and use. To achieve these goals, we believe, requires soon leaving behind the current computing environment for microsimulation modeling and moving toward new technology.
FUTURE DIRECTIONS FOR COMPUTING IN MICROSIMULATION
We are excited about the prospect that powerful new hardware and software technology will make possible a new generation of microsimulation models that support increased modeling capabilities and flexibility for analysts and thereby attract a new, broader user community. In considering what microcomputer hardware is likely to be available by 1995, it seems probable that the computing power currently required for a TRIM2-equivalent simulation (including CPU, memory, and secondary storage) will be available on a workstation costing no more than $10,000. Linked microcomputer and mainframe systems that provide useful functionality may also be available.
Considering software developments that could apply to workstations (and possibly other platforms), there are promising innovations noted above such as graphical user interfaces. However, these software environments are not yet settled. In particular, no market leader or industry standard has yet emerged. It may be several years before it would be prudent to choose the software environment in which to make substantial investments in microsimulation model development.
Given the need for improved functionality of microsimulation models, while recognizing the risks that always accompany an investment in new technology, we urge policy analysis agencies to proceed both resolutely and cautiously to explore the potential of new computing technology. It is probably a mistake to plan immediately to port an entire existing model (such as TRIM2) to a new computing platform (such as that used by SPSD/M), in part because there are aspects of the SPSD/M computational design that could be improved. More over, some of the potentially most fruitful software developments, such as graphical user interfaces, are still in an early stage of development. Finally, to port TRIM2 to a radically different environment could entail substantial costs with little offsetting benefits. As Cotton and Sadowsky concluded (in Volume II):
The basic difficulty in extracting benefits from a desktop version of TRIM2 goes back to the…[fact that TRIM2 and other models like it] have all been optimized to run on a central mainframe computing system that relies primarily on batch processing. Moving these systems into a very different
environment minimizes their operational strengths and exposes their lack of ability to exploit the new environment. The benefits of porting TRIM2 in its present form…are moderate at best and justify neither the real costs involved nor the opportunity costs of preempting investment resources that could better be used elsewhere.
In other words, if substantial resources are put into creating a personal workstation version of TRIM2 (or MATH), there is the real danger that the opportunity for making a breakthrough in microsimulation model technology will be lost and the pattern of the past 20 years—with present needs crowding out medium-term investments, whether they be in adding capability, assessing uncertainty, or improving the software design—will repeat itself once again.9
We believe that the best way to proceed at this time is by taking a series of relatively small steps, particularly until the directions of new software (and hardware) developments become clearer. First, policy analysis agencies could well make some modest investments in developing workstation-based front-end tools for helping programmers modify and manage the existing mainframe-oriented models in a more cost-effective manner. Second, it is not too soon for the agencies to consider how to translate the large volume of code in models such as TRIM2 and MATH, which embody the various social welfare program accounting schemes and rules of operation, to a new computing environment with minimum error.
Third, and most important, the agencies should invest in developing prototypes of both static and dynamic models using the best available new hardware and software technology. The prototype development could seek to mimic, in a very skeletonized form, the basic functions of a model such as MATH or DYNASIM2 (or of a component module, such as the simulation of SSI or AFDC), and then to add to these functions one or another important new capability. For example, a worthwhile prototype to develop and test would be a model that makes it possible for analysts to conduct validation studies, such as sensitivity analyses or estimates of variance, without programmer intervention. Another prototype might be one to make it possible for analysts to readily alter the aging or imputation routines that are used to construct the database. The object would be not to produce realistic simulations of policy changes, but to test new ways to enhance functionality such as increased flexibility and accessibility.
Recommendation 7-1. We recommend that policy analysis agencies invest resources in developing prototypes of static and dynamic microsimulation models that use new computer technologies to
provide enhanced capabilities, such as the ability for a wider group of analysts to apply the models; conduct timely and cost-effective validation studies, including variance estimation and sensitivity analyses; and alter major components, such as the aging routines, without requiring programmer intervention.
At the same time, the agencies need to keep abreast of ongoing technological developments, particularly the kinds of software, such as graphical user interfaces, that are emerging with the potential to provide substantial benefits for microsimulation. In this regard, we urge policy analysis agencies to obtain input from computer science researchers who know in what directions software technology is moving and what advantages new technology could offer for microsimulation. We also urge the agencies to keep in touch with developments abroad. Although the details of government policies and programs vary, many countries are involved in microsimulation for a range of social welfare programs and share an interest in developing cost-effective computing platforms that support increased model functionality. Recently, the United Kingdom, under the auspices of the Working Party on Social Policy of the Organization for Economic Cooperation and Development (OECD), organized a panel of interested countries to look at the current state of the art in the use of microsimulation methodology for analysis of social policies and taxation. The activities we recommend that U.S. agencies undertake to investigate and invest in new technology for microsimulation could well be coordinated with the activities of this OECD panel.10
Once experience has been gained with prototypes (we urge the agencies to move rapidly in their development) and the software environment has settled down so that reasonable choices can be made, agencies can move ahead with efforts to develop new microsimulation models based on next-generation computing technology. We urge agencies in the United States to form a broad consortium for this work and to consider involving interested agencies from other countries that have strong commitments to microsimulation modeling as a policy analysis tool.
Recommendation 7-2. We recommend that policy analysis agencies, after experience with prototypes and reviews of developments in computer hardware and software technologies, make plans to invest in a new generation of microsimulation models that facilitate such design criteria as user accessibility and adequate documentation and evaluation of model components, as well as computational efficiency.