THE ROLE OF RESEARCH IN MEETING IT NEEDS
The federal government has been instrumental in developing the field of computing since the early days. One of the first applications of information-processing technology in government was the use of punched cards and mechanical punch card-based tabulation devices, invented by Herman Hollerith at the Bureau of the Census. These devices, used to tabulate results of the 1890 census, led to the ubiquitous use of punched cards as a medium for input, output, and storage for several decades. After World War II, with the advent of larger computers, demanding government missions including national defense, gathering and analyzing statistical data, and operating the Social Security system prompted substantial federal R&D on both hardware and software. The mission of preparing federal statistical data, for example, triggered a number of innovations, including the 1951 delivery of the first Univac (RemingtonRand) computer to the Bureau of the Census to help tabulate census results; development of the Film Optical Scanning Device for Input to Computers (FOSDIC) that enabled 1960 census questionnaires to be transferred to microfilm and scanned into computers for processing; and Bureau of the Census development of the Topologically Integrated Geographic Encoding and Referencing (TIGER) digital database of geographic features covering the entire United States, which served as a foundational data set for subsequent geographical information systems. Indeed, prior to 1960, the federal government was the dominant customer for comput-
ers.1 This federal dominance—or at least heavyweight status—in the IT sector was also manifested in government’s investment in efforts to establish federal data-processing standards, a substantial fraction of which was aimed at information systems security (reflecting a long-standing federal interest in this area).
As the commercial market for computing grew in the 1960s, federal support for IT research continued to reap benefits. Over time, government lost its position as the leading customer for computers, but because information technology remained critical to its various missions and because government led demand in many specific respects, federal support for IT research continued. In the 1980s, the Strategic Computing program, for example, was launched in the U.S. Department of Defense (DOD) in order to accelerate the development and transition of information technologies critical to defense applications. Expanding on Defense Advanced Research Projects Agency (DARPA) research aimed at meeting military requirements, and on U.S. Department of Energy (DOE) energy research, NASA space research, NSF basic science, and the mission research of several other agencies, the High Performance Computing and Communications (HPCC) initiative was created in the early 1990s to help address “grand challenge” applications in areas of government interest such as health, education, libraries, and crisis management, and to accelerate innovation in critical supporting information technologies.
The history of federally funded IT research shows that problems motivated by government needs, such as networking and parallel processing, when suitably framed in a well-designed research program have proved to have wide commercial application (as evidenced by the Internet, distributed transaction processing, and data mining). Broad goals were often pursued in order to infuse new thinking into the technology supply chain of vendors and technology developers for a mission agency. Examples of the success of this approach include process separation for security in operating systems (funded by DARPA in the 1970s and 1980s), computational science (NSF, DOE, and NASA in the 1980s), and custom very-large-scale integrated circuit (VLSI) chip design (DARPA in the 1970s and 1980s).
Ideas were transferred to the commercial sector through direct sponsorship, or through employment or entrepreneurship of laboratory re-
searchers. CSTB’s report Evolving the High Performance Computing and Communications Initiative to Support the Nation’s Information Infrastructure (also known as the Brooks-Sutherland report, after the committee’s co-chairs)2 examined the payoff and key lessons learned from federal investment in computing research. The study concluded that this broad and sustained federal support profoundly affected the development of computer technology and ultimately led to numerous commercially successful applications. In turn, this stimulation of the commercial sector provided the means for government to acquire IT to meet its own needs.
The importance of continued investment in foundational technologies, by players across the board, should not be underestimated. Government in particular has a unique and historic responsibility to help “raise the floor” by working in critical research areas that need stimulus. These are areas in which innovation tends to be nonappropriable—that is, entities cannot retain, or appropriate, the value from an invention, and instead the value diffuses broadly. Thus, advances that are nonappropriable may not directly advance the competitive position of any individual commercial entity, though they may create increments of capability that in the long run may benefit all users of a technology. An additional effect of innovation is that it can create market uncertainties—innovations can disrupt the competitive position of established players, and in unpredictable ways. Thus, it may not be in the self-interest of any commercial party to support innovation. (This point is discussed further in the section “Will Industry Do It” in Chapter 4.)
Today, as government looks to continue to stimulate IT innovation and meet its own needs through IT research, three categories of research are apparent, which might be handled quite differently with respect to research management, government sponsorship, and engagement with industry. The categories are these:
Broad infrastructure. These “platform” technologies are important for all users of IT, whether commercial or government.
Governmentwide use. There are important developments that could be applied broadly across many government programs but that might not necessarily apply in commercial settings. Many of these technologies fall
in the category of middleware (see the subsection “Middleware” in this chapter). DOD has made some attempts to develop agencywide technologies in this category (such as the High Level Architecture [HLA] and the Common Operating Environment [COE]). An example of an unsolved problem here is that of developing one or more authentication schemes that government agencies can use to authenticate citizens for various categories of transactions. Where government is successful in developing technologies of this sort and stimulating their uptake, wider commercial use may be an important consequence. But the investment is justified in terms of the government use because there is no established market.
Mission-specific. There are many technologies with narrower applicability that address more focused government technical problems. In the course of the committee’s workshops examining crisis management and federal statistics, a number of opportunities were identified where the mission interests of federal agencies and the technical interests of researchers overlap. Government applications provide not only new technical research challenges but also, frequently, texture, richness, and veracity not easily created in laboratory studies. Working on this class of problems can thus represent a mutually beneficial situation for the researcher seeking to explore an idea and the government body seeking to discover new approaches to solving a particular problem. Research funded by the NSF’s Digital Government program illustrates some of these points of intersection, many of them already identified by other federal agencies and computer science researchers (see Box 3.1). A number of specific applied-research opportunities identified in the committee’s workshops are described in its earlier workshop reports.
All agencies with organic research programs undertake research in category 3, above. Those for which IT plays a strategic role also undertake programs in category 2. Those with a broad stake in the growth of IT (such as DOD, NASA, DOE, and others) also address challenges in category 1, along with the NSF. The NSF Digital Government program is positioned to undertake research in all three categories, forming alliances with mission users in order to identify requirements more concretely, provide access to data and subject-matter experts, and provide potential for early validation of innovative concepts.
SOME E-GOVERNMENT RESEARCH AREAS
The research areas discussed in the following subsections are drawn from the committee’s detailed studies of crisis management and federal statistics, as well as from its less-intensive explorations of other government application domains, reviews of the literature on government IT,
and interactions with experts both from within and outside government. The research topics presented in this chapter are intended to provide examples of the kinds of topics likely to be important to government information technology. It must be noted, however, that inclusion of a topic in this chapter does not mean that it is an area of sole or even primary importance to government. Many of these issues are important across-the-board—in government and the private sector alike—though the government context may present especially stringent requirements or particularly novel issues or in some other way lead in demand. In addition, it must be noted that this compilation of topics is not a comprehensive research agenda. Because the IT requirements across government mission agencies are numerous and diverse, the absence of an area of research from this chapter should not be taken as an indication that it is not worthy of support in the context of e-government. Moreover, lists of pertinent topics will evolve over time as the state of the art advances, requirements in government are modified or become better understood, and other changes occur. Nor could a study of this scope and level of effort present a comprehensive analysis of requirements and gaps governmentwide.
These caveats notwithstanding, the committee intends the compilation of topics presented below to be useful to those seeking to stimulate innovative projects, solicit proposals from researchers, or engage operational government agencies in IT research.
Government applications present significant challenges for information-management technology. Governments hold large amounts of heterogeneous data from a wide variety of sources—textual information, demographic data, geographic data, image and video data, and so forth— and in databases with many different schemas. They also present a heterogeneous computing environment, with numerous types of computer platforms, database systems, information retrieval systems, and document-management systems, just to cite a few.
This diversity of computing systems reflects the substantial scale and longevity of legacy investment that is typical of government (and often found in the private sector as well), and the large number of departments, agencies, and programs that traditionally have specified and procured systems without reference to an overall information or systems architecture. Many older systems adhere to obsolete federal information-processing standards. More recent systems adhere to prevailing commercial standards. But agreement on standards is difficult to achieve, compliance with such specifications is never total, standards do not cover the full range of design decisions (especially with respect to semantic issues),
and, perhaps most significantly, standards evolve over time. In short, standards alone are not a solution to the challenges of information sharing and integration.
Government systems also provide direct support for diverse users and applications—including, for example, commonly required transactions at local, state, and federal levels; requests for information about people and property; requests for historical information (including information retained in official archives); requests for statistical information; researcher requests for various types of information; and government-worker requests. While some systems are intended as servers or only for expert users, others are expected to provide meaningful access to a broader user base. And regardless of their role, information-management systems design must also take into account a basic tension in the government environment: providing access to as much information as possible while protecting system security and individual privacy. This goal can be
SOURCE: National Science Foundation Award Abstract Database, October 2001. Available online at <https://www.fastlane.nsf.gov/a6/A6Start.htm>.
challenging to accomplish because there are interactions among system designs even when the systems themselves do not interact—the results of queries to separate systems can be combined externally. These considerations are reflected in a number of research needs, both basic and applied, for systems that access government information:
Capabilities for finding relevant information in text and extracting structure from text;
Managing unstructured and semistructured data;
Finding relevant information in other types of data, including time-dependent data, geographical databases, satellite data, images, video, and audio;
Providing effective access to heterogeneous information types (e.g., mixed text, image, audio, and geographical data);
Providing effective access to multilingual information, including
ways of entering queries and searching in multiple languages, including cross-lingual searches; and
Representing and managing approximation, uncertainty, and inconsistency.
A second class of problems is that of metadata and interoperability among data sets and information systems. Without agreement on the format and meaning of data (or a means of reconciling different formats or semantics), it is not possible to transfer information from one system to another or combine information from multiple systems. One set of issues concerns information about the data or metadata. For example, metadata about a raw number provides information—such as its scale, accuracy, or unit—that permits it to be interpreted. Interoperation is easier when system designers reach consensus about the metadata. This is primarily a social process, but it involves the technical dimensions of representation and semantics as well.
Representation issues include the syntactic/lexical representation of each individual data item (e.g., a date or name), the record structure for aggregates of multiple data items (e.g., a set that includes a name, social security number, date of birth, and employment start date), and the linear byte-stream representation of that data for transmission over networks and storage on secondary media. Semantic issues include reliability, source, and other attributes related to the quality of the information; the interpretation of the information; and the consistency relationships of that information with other data.
The XML “meta standard” for metadata has been widely embraced because it provides an effective approach to achieving commonalities with respect to the three representation issues just noted. XML itself only provides a language for describing data and relies, therefore, on the success of social processes to obtain consensus on representation within specific domains of common interest. With respect to semantic issues, there is less progress, though the XML standard at least enables communities to “speak” a standardized language in addressing semantic issues. Standardization of the metadata describing the format of databases can be achieved through agreement on XML DTDs (document type definitions, which are formal descriptions of what can appear in a document and how documents are structured), and while not research per se, this is an area deserving continued work. A variety of government and industry bodies are working to develop standards for various application areas.3
Research on semantic issues is taking place in the areas of knowledge representation and agent-oriented computing. While various approaches have been advanced, including mediators and wrappers that act as translators between the format and syntax of one system and another, semantic interoperability is generally viewed as an unsolved problem—or rather, one that can be solved only in increments. The social process of agreeing on metadata that describe the content or the important topics covered by information objects and databases is difficult to standardize but is one crucial element for integrating information resources. This type of metadata is often expressed using a predefined vocabulary, represented as a list of categories or more structured forms such as taxonomies and ontologies. Common ontologies facilitate semantically meaningful integration of data from diverse information sources.
Efforts to agree on standard data descriptions, which are more a matter of implementation than of research, should not be confused with work aimed at developing and merging ontologies for describing content, as well as at tools for ontology building and sharing. Many taxonomies already exist in government agencies, and many others are being created— for example, in the sectors of human resource, finance, and health care. Developing technology to support the development and merging of taxonomies, as well as the application of these taxonomies to information objects, is an important research challenge. As part of these efforts, ways must be found of coping with the evolution of metadata. Another challenge is to find ways to address heterogeneous metadata standards, much as systems must support multiple image or document format standards.
A number of other research issues come up in the area of integrating/ fusing information from diverse sources. For example, improved techniques for combining results from different systems (e.g., multiple-text search engines, database systems, geographic information systems) and techniques for presenting those results would all be of value in a government setting.
The design of information systems themselves is an area of ongoing exploration. One recent trend has been the development of centralized data warehouses, which contain data extracted from operational transaction systems, to facilitate retrieval or analysis. Such warehouses could provide short-term benefits, such as improved understanding of the cost of programs and how agencies use resources, and long-term benefits, such as better understanding of the impact of programs and support for enhanced planning of new programs. Frequently, information is stored in multiple systems, which necessitates special techniques for locating and retrieving it. This entails developing improved algorithms for finding information resources, selecting the appropriate sources for a given query, representing their content, and merging the results.
Data-mining techniques offer capabilities for discovering important but nonobvious patterns and relationships among and between a wide variety of data types. These include improved algorithms for data mining of conventional structured databases (including data warehouses) as well as techniques for the data mining of less-structured text data and more complex multimedia data sources.4 The wealth of data in government information systems presents an attractive opportunity for developing new data-mining techniques, though it will be important to differentiate among user groups. Statisticians or social scientists seeking to explore patterns in demographic data will have a different set of needs from those of nonexperts. Interesting questions include how nonexperts might use data-mining capabilities and how one could make these techniques available to them in a usable form.
CSTB’s 1997 project examining every-citizen interfaces to the nation’s information infrastructure underscored the opportunity and challenge of developing technology that could be used easily and effectively by all.5 Many e-government systems must provide information and services to a range of users: experts within the government, experts outside government, and the general public. Even when the user population is segmented by capability and interest, success in delivering services depends on the development of appropriate human-computer interfaces (HCIs). HCI issues are especially important because there is no Moore’s law on human perceptual, attentional, or cognitive/problem solving capabilities—in other words, people’s abilities do not scale up at the same rapid rate that basic computing capabilities do (or, for that matter, the rate at which the total volume of information resources is growing). Thus, as Herbert Simon has observed, the scarce resource in human-computer interfaces is and will remain human attention. This is especially true in nonroutine applications such as crisis management.
HCI is an inherently multidisciplinary research area, drawing on ideas from psychology as well as computer science (and related areas such as
information management). A hallmark of the field is the use of iterative user-centered design (UCD) methods to develop useful—and especially usable—systems. Current UCD methods include early focus on users and their tasks, ongoing empirical measurement and evaluation of the system, iterative design and testing, and integrated focus on the end-to-end systems that considers the larger social context in which they are deployed. HCI professionals are involved early in the design of systems and participate throughout the later stages of system development. Existing UCD methods have been found to work well when there is a limited range of users and tasks, which means that accommodating the greater diversity of individuals and applications in government will likely require extension or refinement of the approach.
Government systems need to be usable by a wide range of individuals and organizations with heterogeneous needs, cognitive abilities, and hardware and software. How can we create systems that allow users, whether they be students, journalists, local community groups, government workers, or policy makers, to access the wealth of government information in a way that is useful to each of them? Providing “universal access” means building systems to work well with a diverse user population and making appropriate facilities available for populations with special requirements—such as disabled citizens, speakers of languages other than English, and those located at remote sites. Analysis of individual differences, and requirements and methods for accommodating them, will play a role. Where some users have low skills and/or little experience, for example, systems that have a high tolerance for human error are likely to be especially valuable.
One particular HCI problem is that of finding, understanding, using, and integrating information of the diverse types found in the multitude of government information sources. Formulating good queries is difficult, and tools that support this process could be improved. Developing advanced systems requires a better understanding of user requirements, information-presentation techniques, information-access strategies, and the development of flexible and modular architectures. A challenge that often comes up in government for supporting decision making, and in government’s communications to the public, is how to represent the uncertainty associated with many sorts of data. Because uncertainty reflects sampling and systematic errors (such as results from a statistical survey), known limitations in the model being used (such as a model of earthquake damage), and other extenuating circumstances, it often plays an important role in correctly interpreting and acting on a piece of information.
Computer systems, including those for information summarization and presentation techniques, should exploit knowledge of basic human
perceptual and cognitive skills. Because today’s interfaces rely on a limited set of input and output capabilities, researchers should continue to push the hardware and software envelopes to support new interaction styles (e.g., richer visualization, perceptual user interfaces, multimodal input, and support for a range of motor and language capabilities). One promising approach for environments in which users interact extensively with data is to provide tight coupling of user actions to displayed results and easily reversible actions. Tight coupling implies low latency, which means that careful attention must be paid to how data transfer and processing are divided between the server and the client. Advances are required at the cognitive level as well, where people’s rich but fallible memories and vast amounts of general and domain-specific knowledge often do not match well with the information required by computer systems. A richer range of interaction styles is also important to match the user’s environment. At the forefront of all design improvements should be the goal of better leveraging and augmenting natural human capabilities. One example is mixed-initiative systems that support both user-initiated (direct manipulation) and agent styles of interaction.
While there are many HCI challenges related to interactions between a single user and an information system, many of today’s systems involve more than one user interacting with local data and applications. Many different communication scenarios of interest exist, including one-to-one (such as when a citizen and a government worker interact), one-to-many (such as when a population at risk in a crisis is being alerted), and many-to-many (such as when a community explores a policy issue). Progress on each of these fronts will require new theories, computing architectures, and design methods to support collaboration, as well as better understanding of the group and organizational contexts of information use. A variety of techniques, such as collaborative filtering, could play an important role.
Particular government applications will pose their own specific HCI challenges. For example, in crisis management, it is especially important that systems be able to support users in carrying out nonroutine tasks and facilitate working in unplanned, ad hoc situations. In the case of statistical data, variability in the public’s statistical literacy poses a particular challenge to effective presentation of federal statistical data. While an expert is equipped to locate the information necessary for understanding the limitations associated with a given piece of data (such as sampling error or the implications of particular definitions used in deriving the data), a lay user may not be so equipped; he or she will benefit from HCI approaches that make these factors as intuitively understandable as possible.
Support for decision makers, including systems that help to frame,
interrogate, and anticipate the world in such a way as to effectively assist in the decision-making process, is another area for continued HCI work. A user-centric view, for example, allows a decision maker to individually establish the context, pose the questions, control the content, and mold the style of presentation. Related applications include interactive team-decision environments, systems that capture and represent domain expertise, and systems that permit intelligent real-time control.
A variety of infrastructure components provide an important foundation for e-government, starting with basic network-communications capabilities. Internet technologies are in widespread use, and both government and the private sector continue to press the Internet industry to provide yet more—greater capacity, for example, and improved security and reliability. These demands continue to stimulate research and development, which has already resulted in dramatic increases in network bandwidths. But because the R&D challenges are largely common to both the private and public sectors, they are not explored in depth here except to note that in government, as elsewhere, the attributes of privacy, scalability, reliability, and accountability, among others, are highly valued.
If the communications infrastructure is to be available widely, especially for use in interacting with individual citizens, then low-cost, ubiquitous access—enabled in part by continued innovation—will be required. A range of new capabilities might be enabled through the use of the Internet, particularly high-capacity, always-connected broadband access (as opposed to the more typical low-speed, dial-up connections, which incur a delay each time an Internet connection is established). In the short term, deployment will proceed using existing technologies with incremental developments to them. Looking to the longer term, further research on wireless technologies offers the potential for improved performance; and in the area of fiber optics, research could provide designs and architectures with lower deployment costs. The complex economic, implementation, and public policy issues associated with provision of broadband Internet access to residences have constrained deployment more than the state of the technology has.6
In contrast to the more routine government functions, domains such
as crisis management obviously change the normal demands for communications. Meeting the information requirements of a crisis depends on an infrastructure that can handle above-normal loads at just the time when large portions of it may have suffered physical damage. Whereas for some applications infrastructure can be brought in from outside a disaster area, crisis response would often benefit from having a more survivable infrastructure in situ, and with sufficient headroom for reliable and survivable crisis response.
Crises require the communications infrastructure to adapt to changing demands by managing unusual traffic-congestion patterns, for example, and permitting priority overrides for emergency usage. Such scaling and robustness questions arise in a number of large networks that are key to public safety, such as air-traffic control; police, fire, and safety communications networks; and 911 and other emergency dispatch systems. Priority-access capabilities are already a feature of the wire-line public telephone network. But with increasing use of alternatives to the public telephone network comes increased interest in providing priority, scaling, and robustness capabilities in public wireless networks, the public Internet, and private networks based on Internet technologies.
Several networking-research questions arise from these requirements. Networks that are self-adaptive are, for example, able to rapidly reconfigure themselves—say, as wireless infrastructure elements—in response to a crisis. In particular, networks that could reconfigure themselves quickly under conditions of damage and changes in demand would be of great utility. Even in the absence of extensive self-adaptation, it is better to have an infrastructure that is able to degrade gracefully as its components are affected by a crisis than a system that completely fails.
A research question that addresses the need to prioritize traffic is how to build networks that allow applications to interact with the infrastructure so as to allow the incorporation of capabilities such as priority override features or the recognition and management of information surges during a crisis. Also, it would be useful to develop interfaces that allow the combined deployment of private and public infrastructure, thereby permitting crisis responders to exploit whatever infrastructure elements are available. In any case, efforts should not be confined solely to improving the infrastructure. Modifying the applications themselves can permit them to cope gracefully with less-than-optimal network performance. Designers of applications intended for use in crisis situations cannot assume that there will be large amounts of bandwidth or that connectivity will be available on a consistent basis. Strategies for coping would include adapting the frequency of updates to the available bandwidth or falling back to activities that consume less bandwidth (e.g., transmitting text instead of multimedia data).
Finally, there are opportunities to leverage “push” technologies in emergencies. Providing up-to-date information to large segments of the public is important because it permits people to take appropriate actions, helps prevent panic, speeds remediation efforts, and can prevent followon crises. But widespread broadcasts (whether by television, radio, or the Internet) are not necessarily the best approach—they provide only limited, situation-specific information and cannot supply details tailored to the needs of individuals, such as what evacuation route to use. By contrast, push technologies could deliver more focused (and presumably more accurate) warnings and more detailed advice on what actions to take and could decrease the frequency with which people receive false alarms (warnings that do not apply to them).
One approach identified as worthy of further investigation involves “reverse 911” systems, whereby the usual direction of interaction between citizens and emergency managers is reversed. “Call by location” can automatically contact all households and businesses that might be affected by a fire or flash flood, warn them of the impending danger, and instruct them on what evasive action to take. Telephone-based reverse-911 systems are already being used in a number of areas. Increasingly widespread deployment, as well as the use of always-on broadband Internet, wireless data, and similar new communications technologies, present additional opportunities for providing this kind of service.
Information Systems Security
Government applications of IT often center on the management of records about individuals and businesses. Significant savings can be obtained by removing intermediaries and allowing direct access to these records—by the properly identified parties authorized to view them. Similarly, the accessed records are subject to change only by those authorized to make changes. To ensure the security of such systems and promote trust in them by citizens, several services need to be applied: confidentiality, integrity, authentication, authorization, and audit.
Confidentiality services prevent the unauthorized disclosure of data while the data transit a network or communication link, or while they reside on disk. These services are intended to prevent an attacker, or any other unauthorized individual, from bypassing data-authorization functions. Confidentiality of data is usually provided by the use of encryption—that is, the scrambling of data so that they can only be unscrambled with the use of the correct secret information, called an encryption key.
Secure Sockets Layer (SSL) is one widely deployed confidentiality mechanism. It is integrated with Web browsers and encrypts data exchanged between the user and SSL-protected Web sites.
Integrity services protect data from unauthorized modification. Like confidentiality services, they often depend on encryption algorithms, augmented by calculation of checksums in order to detect changes. A digital signature is a form of integrity protection, as its validation shows that the data to which the signature was applied have not been inappropriately changed. SSL also provides data integrity for the messages sent between a Web browser and a server, preventing an attacker on the network from modifying the messages, and thus the data contained in them, without such changes being detected.
Several ways exist for validating a user’s identity. The simplest is for the server to keep a list of passwords and require that the user enter his or her password. This approach is not particularly strong, and it is difficult to manage, as the user would have to register in advance with each individual system with which he or she might eventually communicate.
Authentication can also be accomplished through the use of an authentication protocol—sometimes based on the use of a private encryption key, known only to the named individual, together with a certificate issued by a trusted third party that associates the encryption key with the user’s identity (digital signatures are one example of this form of authentication). In other cases, authentication is based on other forms of encryption with keys that are also distributed by a trusted third party (Kerberos authentication is an example).
The use of third parties for authentication, whether these are certificate authorities in the public-key infrastructure or authentication servers in Kerberos, requires that these services be trusted, both by the government systems accepting authentication and by the users whose identities are being authenticated. This is where many of the problems associated with the establishment of public-key infrastructure have been experienced. There is no single authority trusted by everyone, and as a result, many organizations (for example, banks) require separate registration by users so that only the bank’s own servers are relied upon.
The establishment of a single government authority to authenticate users presents privacy issues, because this authenticated identity becomes a unique identifier. Letting the government rely on authentication au-
thorities in the private sector also presents possible security problems, however, as a user might obtain a certificate fraudulently (this is not to say that users could not do so with a government certificate authority). Ultimately, it will likely be the case that multiple authorities are supported and that authorization policies will dictate which ones must be used in particular contexts. When signing up for a service, the user would specify or accept the default for which authorities may be relied upon for subsequent authentication. This leaves unanswered the problem of how the user would authenticate his or her first contact with a particular agency.
The end point of security services is authorization: controlling access to only those entitled. Other security services play supporting roles, either in helping to make the authorization decision (as is the case with authentication) or in preventing individuals from bypassing the authorization mechanisms to access or modify data through other means (prevented by confidentiality and integrity services). Even audit, described below, relates to authorization, as the goals of the audit mechanism are to assure in retrospect that appropriate policies were enforced by the authorization mechanisms and to find weaknesses.
While the basic technologies are reasonably well understood, their deployment and operation at large scale are not. One example of the sort of poorly understood issue that arises at large scale is policy management, which is one of the hardest aspects of creating secure systems. In fact, many security breaches result from either the misapplication of security policies, or the use of security policies that are not appropriate in particular contexts. Writing correct policy requires a thorough understanding of how a system is to be used and how it is not supposed to be used. While creating consistent policies in a single system is hard enough, creating them in government systems is likely to be even harder—especially when the policies are concerned with the exchange of information between agencies. This is the case because there are likely to be numerous legislated policies and procedures specifically associated with each agency, many of which might not be consistent with one another.
The whole issue of exception access is another aspect of security policy that must be considered. For example, one might want to create policies that allow access to data in certain crisis situations, such as a building fire or medical emergency, that is otherwise not allowed. If the data are stored in a computer system, then these exception policies must be part of the policy base enforced by the system, further complicating the problem of
policy management. Such policies must allow exception access, but in a limited way, so that the security of the entire system is not compromised.
One of the most effective ways to defeat security is the practice of “social engineering.” For example, a perpetrator calls and explains that there is a truly unique emergency, requiring urgent access to the records. This is how many hackers, spies, and thieves gain unauthorized access. So if a system is constructed to simply weaken access policies in an emergency, then as a consequence it can also become easier to intrude through social engineering—there is a trade-off between efficiency of operation and efficacy in crisis situations and security and safeguards. Improve one, and the other gets worse. In the end, the weak link is always the human element—not because humans are faulty, but rather because the procedures put in place as safeguards often end up thwarting those who must use them on a continual basis.
To date, most support for policy management is done on a custom basis for each application, creating stovepipe security systems that are inflexible, difficult to understand, and present many weak links for an attacker to exploit. Work is underway, for example, on the Generic Security Services Application Programming Interface, to create common interfaces for authorization, but the results of such efforts will not be useful until one starts to see integration of these interfaces with a wide range of commercial systems.
Audit mechanisms are important for improving the public’s confidence in security systems because these mechanisms provide a way to record the kinds of access that were granted; they allow administrators to take action after the fact when access rights have been abused, and thereby to correct the problem. Audit mechanisms will be of particular importance for any exception policies that are used by emergency workers or by others with authority to override normal access protections. For someone with exception access, knowledge that his or her use could be scrutinized helps deter abuse of such access. To be most effective, the audit mechanism for exception access should generate reports that are submitted to the person whose data were accessed.
Audit also applies to the configuration management of systems, which is the effective monitoring of the integrity of software components, hardware devices, and supporting configuration data. Some viruses and worms, for example, exploit vulnerabilities that were created by previous infections of other viruses and worms. In addition, configuration management can detect failures in upgrading in response to security updates. Indeed, many users and systems administrators lack the tools to detect
whether systems software has been compromised or even whether new (and unwanted) software applications have been installed on agency or corporate systems.
E-Commerce and Related Infrastructure Services
E-business depends heavily on a number of services that run on top of the basic communications fabric discussed above, including these:
Information dissemination services, such as Web servers and push technology;
Communication services, such as e-mail and instant messaging;
Directory services, which provide for referencing and retrieval of collections of personal information (e.g., name, organizational affiliation, and e-mail address or mailing address) as well as information such as who is authorized to perform which functions in an information system; and
Security and identification services, which permit information systems to be secured, users of those systems to be authenticated, and access to be authorized.
Some of these services, such as Web servers, e-mail, and basic directories, are all relatively mature technologies; the government is likely to be able to leverage commercial-sector developments in them. One related area in which government generally needs to pay more attention to research, however, is that of how to build a very-large-scale authentication infrastructure (e.g., a public-key infrastructure that supports all citizens). Another is back-end information-management challenges related to tying together multiple systems feeding a Web site (such as building ontologies to bridge the stovepipes in different agencies). Specific research issues with respect to these areas are addressed in other sections of this chapter.
In addition to enhancing its own transaction-support capabilities, the government has an opportunity to promote the development and use of common transaction mechanisms to widen access to government-supported services. Citizens would then be able to select a third-party intermediary acting on their behalf to aggregate information from different government services or to run software on their local computers for direct access. Such third-party-provided portals would be similar to some of the Web services that have become available recently for aggregating access to multiple e-mail accounts, online banking, brokerages, and other password-protected Web sites, but without persistent storage of users’ credentials on the portals. The local application access example is similar to applications like Quicken or Money, which pull data from multiple finan-
cial repositories and aggregate the data in local reports. Government-run Web portals could be provided in addition to the base transaction mechanism, but such portals would also access the government service through the transaction mechanism.
The transaction mechanism would be accompanied by interfaces and mechanisms for communicating with the government portal, collecting the information, and providing it to the application or third-party information portal. The data sent or retrieved through this transaction mechanism would be described by common ontologies and data definitions (e.g., based on XML) to allow this interface to be integrated with higher-level applications. When access to government services is provided through a third party (or even through a common government-provided portal), security may be provided using proxy or delegate credentials. These confer to an intermediary the authority to retrieve data or enter a transaction on behalf of a citizen, but only for specific purposes and for a short period of time.
Government has historically provided data in electronic form, but often in formats that make it infeasible for all but a few specialized contractors to readily exploit the data. Application-programming interfaces (APIs) would also enable citizens and businesses to overcome this problem by using software that directly connects their own applications running on personal computers with government services. A number of technical issues are related to achieving this “lightweight” capability, however. These include protocol design, information representation and metadata, security and authentication, and digital libraries.
Crisis management and similar applications present some novel requirements with respect to e-commerce, including these:
Data escrow services. Development of technologies and standards would enable escrow sites to be established where citizens could store important information that they might need to access in a crisis, as such information might otherwise be rendered inaccessible in their home machines. These escrow sites would, in some sense, be a personal analog of the disaster-recovery services already provided by commercial services for businesses, government, and organizations. Escrowed data could include medical records, financial data, family contacts, and other essential records. A principal challenge is that the escrow technology must protect the user’s privacy while improving the survivability and accessibility of his or her personal information.
Circumventing normal controls on use of data that might have been collected for other purposes. The serious nature of a crisis may override the normal desire and practice of not sharing the data. For example, tax records might be used to help identify individuals who were working at
companies affected by a disaster. Information such as names and home addresses would be candidates for release under extraordinary circumstances, while other information, such as income, should not be released. E-commerce systems would need technology extensions that permit controlled release under exceptional circumstances and a means to discern when an “exception state” exists. Also, privacy policies and government regulations would need to provide a means both for allowing and limiting the extent of such emergency sharing, and for recovery of the data (i.e., forcing its removal) from its temporary uses after the crisis is over. As described above, social engineering is a significant risk that calls for a systematic automating of the emergency access rules. If such rules are implemented properly, the declaration of an emergency or a particular threat condition becomes an official action determined under a fixed set of rules, rather than something amenable to convincing some operator that such a situation exists. Blanket bypass of security measures should also be avoided in favor of a system that switches to enforcing an alternative set of security measures that are defined in advance. Modeling tools can help administrators understand the long-lasting consequences (for example, to privacy) of potential crisis-motivated decisions to change security policies.
Enhanced point-of-contact services. The issue here is how one can extend directory services—in combination with the push technologies discussed above—to provide facilities for locating particular capabilities or individuals on an urgent basis.
The committee notes, as others have, that the government can play a leading role in promoting certain aspects of e-commerce and e-business, especially where the government is in a unique position to support the deployment of certain technologies, or where specific government services may be leveraged to improve the effectiveness of these technologies. Such opportunities include the following:
Government as a certification authority or licensor of such authorities;
Government as a leading player in deployment of other security technologies, such as smart cards; and
Government as a standards adopter.
Models and Simulation for Decision Making
Technological advances have made enormous amounts of computer processing power available to government agencies. In particular, modeling and simulation technologies being developed today can be used to
approximate extremely large and complex systems, frequently with hundreds of millions of interacting components. As representations of the real world, these models are unique in their ability to illuminate the systems’ inner workings and predict the consequences of particular actions. Such models are relatively new additions to our arsenal of methods for understanding and anticipating how the world works. Making use of them for decision making imposes substantive burden of proof upon the developers and requires a confidence level on the part of the users. Modeling-research areas relevant to government applications include these:
Underlying mathematical theory, including mathematical system theory, sequential dynamical systems, combinatorial and dynamical graph theory, and algorithm theory;
Statistical theory and methods in areas such as statistical analysis of computer-based-simulation experimentation and modeling, statistical analysis of dynamical systems, and integration of statistical decision theory with cognitive action analyses;
Computational methods, including optimal representations for high-performance coupled system architectures and learning and adaptive systems methods; and
Because sensor data provide critical input to models and simulations, techniques for fusion of sensor data and model output, reconfigurable sensor architectures, and adaptive online data acquisition.
Nearly all major information technology systems in government are “software-intensive” in the sense that the principal design risks relate to the capability to produce effective and reliable software. Software is the principal building material for information systems. It is difficult to design, develop, measure, analyze, evolve, and adapt. It is fragile and undependable. The reality is that software engineering remains a largely unsystematic craft, especially for large-scale custom systems. Few government software-engineering projects are completed on time and within their initially allocated budget.7
These issues are discussed in CSTB, NRC, 2000, Making IT Better, National Academy Press, Washington, D.C. This point was underscored by the President’s Information Technology Advisory Committee in its 1999 report, which, in addition to noting the potential of information technology to transform government, highlighted software as the highest priority for research attention. See President’s Information Technology Advisory Committee (PITAC). 1999. Information Technology Research: Investing in Our Future. PITAC Report to the President. February 24. Available online at <http://www.itrd.gov/ac/report/>.
Software research includes not only development of software-engineering capability and the considerable range of associated activities, but also the design of systems-dependable software components (such as operating system kernels) and the development of architectural frameworks to support the large-scale interconnection and interoperation of components (see “Middleware,” below).
The culture of the commercial software component marketplace, as evident from typical end-user license agreements, is that the maker of a software system or component offers no warranty concerning software function or quality. The persistent technical challenge of measuring quality promotes continuation of this caveat emptor situation. That is, even a considerable improvement in the technology of delivering and assuring higher levels of quality would not likely be adopted in mainstream software developments until the value of adoption could be measured and quantified. This applies also to large-scale custom engineering, in which challenges expand to include validation (conformance with actual organizational need) as well as verification (compliance with stated requirements, which may or may not be consistent with actual need).
Except for highly precedented systems, software-engineering processes in commercial industry tend to be iterative. Major vendors release periodic upgrades to components and systems, often several times per year. The vendors are thus able to respond to market demand. Large-scale government custom-development efforts often start with a “requirements elicitation” process, and the validity of the results of that process may not be understood until well into development—often, years later. For this reason, there has been considerable activity in the acquisition and regulatory community to support iterative models of development, in which successive prototypes are deployed in order to allow early feedback on particular facets of concern (“risk issues”) relating to requirements, design, or underlying infrastructure. (More discussion on this subject is presented in the section “Dimensions of Risk” in Chapter 4.)
A principal concern in the development of government IT systems is the embedding of commercial off-the-shelf (COTS) components. This includes adoption of commercial components such as mainstream commercial operating systems and office “productivity” tools, as well as of open-source components such as the Linux operating system, the Apache Web server (which is the dominant server in use today), and the Mozilla Web browser. Many COTS components can be opaque and difficult to analyze, thus thwarting acceptance tests related to reliability and security. They can also be subject to rapid evolution—that is, from the standpoint of the integration effort, they are on fundamentally uncontrollable trajectories. On the other hand, failure to adopt commercial components may not only raise development costs and risks unacceptably but can also
incur increased training costs (necessitated by unfamiliar and system-specific human interfaces).
With respect to the new generation of e-government systems, a significant challenge is in the rapid prototyping of new capability so that functionalities and associated engineering issues—such as security, interoperation, and performance—may be explored. Major new software libraries, both from vendors and open-source communities, are enabling this kind of capability.
A multitude of problems—including delays, unexpected failures, and inflexibility in coping with changing needs—are associated with large-scale systems, and government offers notable examples of such systems and their problems. Such problems have occurred, for instance, at the Internal Revenue Service and the Federal Aviation Administration and in numerous systems at the state and local levels. This situation reflects not only the size and complexity of some of those systems but the chronic shortages of IT expertise in government. The problem is growing with expanded use of the Internet, which has fostered proliferating interconnected systems. More robust systems would also help reduce the costs of IT staffs currently needed to support IT systems. And the importance of the problem is growing as people come to depend more on such systems.
Research should address deep interactions among system components and intersystem dependencies, unintended and unanticipated consequences of system alteration, emergent behaviors in systems with large numbers of components and users, unstable behaviors, properties of federated systems, and other phenomena. In addition to these systems-engineering issues, research must address operational engineering issues such as how errors are corrected, how security breaches are detected and remedied, and how backups or other robustness measures are executed. A research program including case studies of particular systems and methodology research on architecture, techniques, and tools could help address the difficult technical (and nontechnical) challenges posed in realizing these systems.8
“Middleware” is software that provides common services and capabilities that “glue together” software components into larger systems. Examples of middleware include authentication, journaling, auditing, database-modeling services, ontological services, indexing, visualization, translation, search and discovery, access control, and electronic commerce services.9 In effect, middleware is software that reduces enterprise application development time. Compared to infrastructure elements such as basic networking or relational database capabilities, which are more mature, middleware has continued to evolve at a rapid pace. Its importance notwithstanding, middleware has, historically, not been an area where computer science research has focused much attention. How can the research community contribute in this area—especially given that many of the tough issues are ones of scale, heterogeneity, and integration that are difficult to address in a laboratory setting?
One role is through research that examines the properties of commercial middleware. Middleware often comes in the form of uniform “frameworks,” which embody principles for component interaction. These principles are intended to have useful scaling properties with respect to numbers of components, distribution over networks, robustness, kinds of capability that can be supported, and so on. Frameworks in widespread use include the Common Object Model family (COM, developed by Microsoft) and Enterprise Java Beans (EJB, developed by Sun and IBM). Designs for framework systems (such as those just cited) can be analyzed by researchers from a theoretical standpoint. Analyses have revealed subtle flaws and design issues in each of the frameworks mentioned above, providing useful information to program managers and system developers seeking to make choices among frameworks.10
Herbert Schorr and Salvatore J. Stolfo. 1997. Towards the Digital Government of the 21st Century (a report from the Workshop on Research and Development Opportunities in Federal Information Services), June 24. Available online at <http://www.isi.edu/nsf/prop.html>.
See, for example, Kevin J. Sullivan, 1997, “Compositionality Problems in Microsoft COM,” available online at <http://www.cs.virginia.edu/~sullivan/standards.html>; Robert J. Allen, David Garlan, and James Ivers, 1998, “Formal Modeling and Analysis of the HLA Component Integration Standard” [abstract], Proceedings of the Sixth International Symposium on the Foundations of Software Engineering, November, available online at <http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/able/www/paper_abstracts/hla-fse98.html>; and J. Sousa and D. Garlan, 1999, “Formal Modeling of the Enterprise JavaBeans Component Integration Framework,” Proceedings FM’99, pp. 1281-1300, available online at <http://link.springer.de/link/service/series/0558/papers/1709/17091281.pdf>.
Another role for researchers is in contributing to the development of middleware aimed at specific government niche applications. The Department of Defense’s High Level Architecture and Common Operating Environment are examples of efforts to meet perceived specialized government needs. Authentication services—both internally and for interactions with citizens—are another area where government leads in demand and has a role to play in middleware development.
Middleware is also significant for e-government research because it can provide an effective platform for rapid iteration in developing new system concepts, both functional and architectural. The library capabilities available, for example, in Sun’s Java Development Kit or in Microsoft’s .NET or in the tools available with most major relational database systems (to cite just three examples) permit skilled developers more quickly to achieve a level of functionality that allows new ideas to be tried out.
Organizational and Social Issues
As a complement to research aimed at new technologies, research on the relationship between organizational behavior and IT can also play an important role in realizing e-government capabilities. The e-government vision outlined in Chapter 1 aims to enhance rather than simply automate government’s operations and its interactions with constituencies. Indeed, success in implementing new practices often requires simultaneous evolution of organizations and their supporting systems—it is not a matter of systems design alone. In addition, because numerous government activities bring together individuals to share information and collaboratively solve problems, a better understanding of the social and organizational aspects of IT use is critical. That is, research needs to be done to understand in a precise way the interplay of system design decisions, changes to business practices, changes in the operating environment, characteristics of the user population, and organizational outcomes. These issues become very significant when joint efforts are undertaken involving multiple organizations in order to deliver an aggregated service for a particular customer segment.
Such a broad perspective is in keeping with increasing appreciation of what CSTB’s 2000 report Making IT Better termed “social applications.” That report observed that emerging demand for “more and better use of IT in ways that affect [people’s] lives more intimately and directly than the early systems did in scientific and back-office business applications” presents “issues with which the traditional IT research community has little experience.” Successful work on the social applications of IT will require new computer science and engineering as well as research that is coupled more extensively and effectively to other perspectives—perspec-
tives from other intellectual disciplines and from the people who use the end results, that is, the goods, services, and systems that are deployed.11 The incorporation of a “social, economic, and workforce” component to the NSF’s 1999 Information Technology Research initiative similarly reflects this new emphasis. CSTB’s 1997 report Fostering Research on the Economic and Social Impacts of Information Technology12 explores the rich research literature on social, organizational, and economic dimensions of IT use and highlights a number of important research areas.
Some specific organizational and socioeconomic research topics identified by this committee as having particular importance for e-government—research that naturally complements technology capability development—include the following:
Understanding the social and economic implications of e-government. Where and how is IT applied in government? How can we assess the extent to which it enables improved and new government services, operations, and interactions with citizens? The goal of this research would be to understand general principles rather than to evaluate specific government agencies and operations.
Understanding how to use e-government strategically as part of overall government service delivery. Research could help shed light on how specific technology capabilities relate to a broader strategy for how people interact with government.