The second workshop of the Chemical Sciences Roundtable (CSR), "Impact of Advances in Computing and Communications Technologies on Chemical Science and Technology," was held in Washington, D.C., on November 1-2, 1998. The presentations and discussion at the workshop considered benefits and opportunities for chemical science and technology stemming from ongoing dramatic advances in computing and communications, as well as challenges to be met in using these technologies effectively for research and in applications addressing pressing national problems. This volume presents the results of that workshop.
Whither Computing and Communications Technologies in Chemical Science and Technology?
Paul Messina of the California Institute of Technology emphasized the importance of ensuring the continuing evolution of computing power, unprecedented levels of which are required to meet such daunting needs as ensuring the safety and reliability of the nation's nuclear arsenal. In particular, he described the Department of Energy's Accelerated Strategic Computing Initiative (ASCI), whose goal is to simulate the effects of aging on the U.S. nuclear stockpile and to assess new weapon designs without further underground nuclear testing. To achieve a computational simulation-based approach to testing will require computer systems capable of trillions to quadrillions of arithmetic operations per second. ASCI is working closely with the computing industry to ensure that this high-end computing capability is fielded in the next 5 to 10 years.
In addition to investing in accelerated development of a new generation of massively parallel computer systems, ASCI is making major investments in computer systems software and scientific simulation software. Rapid progress in both computing and simulation capability is required to make these systems usable for addressing the targeted problems. This massive undertaking—which involves the academic community and the U.S. computer industry in addition to applications scientists and engineers at Los Alamos, Lawrence Livermore, and Sandia national laboratories—is attempting to build
a balanced system in which computing speed and memory, archival storage and capacity, and network speed and throughput are combined to dramatically increase the performance simulations. The approach of using commercially available components to the extent possible will facilitate transfer of the new technologies for use in a number of scientific and engineering pursuits without duplicating ASCI's costs.
Messina asserted that the advances in computing power that will become available in the next 5 to 10 years will be so great that they will change the very manner in which we pursue advances in science and technology.
Peter R. Taylor, San Diego Supercomputer Center and University of California, San Diego, spoke on the state of the art in computational chemistry and the extent to which it moots the requirements of the chemical science community. Computational chemistry—whose major activities can be classified as molecular electronic structure (often referred to as quantum chemistry), reaction and molecular dynamics, and statistical mechanics—is one of the great scientific success stories of the past three decades, as evidenced by the award of the 1998 Nobel Prize in chemistry to John Pople and Walter Kohn. Taylor described computational chemistry as a mature and very successful field that nevertheless requires continuing effort to improve theories, methods, algorithms, and implementation. He also pointed to a need for training students in these areas.
More powerful computers will allow current methods to be extended to over larger molecules, but new methodologies will be needed to address many of the problems of interest to chemists. Taylor stated that the chemical sciences community needs to encourage the implementation of existing methods on now hardware, as well as the development and implementation of new methods. As new methods are developed, possible advantages offered by new computer architectures can be considered; e.g., approaches previously precluded because of requirements for enormous memory might be perfectly feasible on ASCI-class machines. Use of modern software engineering practices and modern computer languages in implementations can increase ease of maintenance. New methods and implementations can also take advantage of modern storage, retrieval, and data management technologies as well as interactive environments in which users can steer simulations and visualize their data.
Susan L. Graham, University of California, Berkeley, started by noting that high-performance computing is difficult. She elaborated on the technical issues that must be addressed if we are to take advantage of the exciting opportunities offered by the ongoing revolutionary increases in computing power. She indicated that one way to get more out of computing is by using parallelism—it reduces the elapsed time required for the most demanding computations, keeps the calculation moving along when delays arise in sequential computation, and overcomes fundamental limitations bounding the speed of sequential computation, such as the speed of light. However, advances from parallelism won't come for free. Issues that must be addressed in improving end-to-end performance of a calculation include identifying the work that can be done in parallel, correctly partitioning that work across the processors, and arranging the data so that it resides close to where it is needed (because of communication delays). Even then, at a lower level in the system, the system software (or a programmer) has to describe the details of how the work is actually done. Graham also mentioned issues in addition to performance that are going to become increasingly problematic, such as security and fault tolerance.
Among the nontechnical issues mentioned were concerns about having enough people with the deep knowledge of both chemistry and information technology required for developing workable problem-solving strategies. In addition, Graham pointed out that the scientific community will have to become
involved in developing software for which vendors do not see a large market and in dealing with issues of access to very high performance systems.
Graham closed by mentioning how implementation of the recommendations from the President's Information Technology Advisory Committee (PITAC), on which she serves, could help address such important issues as the need for investment in long-term information technology R&D.
Computational Modeling and Simulation in Chemical Science and Technology
John A. Pople of Northwestern University discussed the importance of having reliable data on the thermochemistry of molecules—knowledge of which is vital in the chemical sciences and essential to many technologies. Because experimental measurements yielding thermochemical data are difficult and time-consuming, it is highly desirable to have computational methods that can make reliable predictions. Since the early 1970s when ab initio molecular orbital calculations became routine, one of the major goals of modern quantum chemistry has been the prediction of molecular thermochemical data to chemical accuracy (1 kcal/mol). The Gaussian-n series, with its latest version, Gaussian-3 (G3) theory, achieves that accuracy on average and is computationally feasible for molecules containing up to about eight non-hydrogen atoms; Pople asserted that it represents a great success of quantum chemistry.
Ideally, a method for computation of thermochemical data should be applicable to any molecular system in an unambiguous manner. The method needs to be computationally efficient so that it can be widely applied, should reproduce known experimental data to a prescribed accuracy, and should be similarly accurate when applied to species having larger uncertainty or for which data are not available. The Gaussian-n methods were developed with these objectives in mind. Despite the successes, Pople argued that much remains to be done. Among the challenges will be extension of the methods to larger molecules, increased accuracy in predictions, and extension to heavier elements. The increased computing power obtainable from new generations of computers, such as those with massively parallel architectures, will play an important role in meeting these challenges.
Jeffrey Skolnick of the Scripps Research Institute discussed the role of computational molecular biology in the genomics revolution. Various genome-sequencing projects are providing a plethora of protein sequence information, but with no information about protein structure or function. Making the results of the genome revolution applicable to understanding biological processes requires knowledge of protein structure and function as encoded in the genome. One means of sifting useful proteins out of the genomic databases is the computer prediction of protein function. To extend the level of molecular function annotation to a broader class of protein sequences, a novel method for identification of protein function based directly on the sequence-to-structure-to-function paradigm has been developed. The idea is to predict the native structure first and then to identify the molecular or biochemical function by matching the active site in the predicted structure to that in a protein of known function. Skolnick believes that the next 5 to 10 years are likely to see the development of improved computational tools for genomic screening.
W. David Smith, Jr. of E.I. DuPont described needs and new directions in computing for the chemical process industries. Changing needs for process and enterprise modeling and the capability of computers and software have reached a critical point where vendors, academia, and industry must cooperate to develop the next generation of tools for the process engineer. The potential for the European CAPE OPEN project to bring the technologies and the players together to provide this set of
tools was discussed. Smith also explored how this technology may change the ways in which companies like DuPont perform engineering and process development in the future.
It is not enough to develop computational methods for modeling chemical processes; one must also facilitate the use of these techniques by scientists. This was the underlying theme of the presentation by Gregory J. McRae1 of the Massachusetts Institute of Technology, who provided an overview of a problem-solving environment called the Chemical Engineering Workbench. The Workbench is currently being developed by a research team associated with the National Computational Science Alliance. When completed, it will provide an integrated software environment supporting a broad range of computational tools for modeling chemical and engineering processes, extending from the molecular level to that of full chemical plants. Quantum chemistry and other tools at the molecular level are being coupled with higher-level chemical process modeling, chemical process reaction modeling, plant process design, and process control. The team's initial effort is to develop an advanced reactor design model that can be incorporated into the Workbench. Reactors are the focal points of chemical plants, and mathematically modeling these complex systems places great demands on high-performance computing. Design considerations from the plant level will feed down to the quantum level. Treating chemistry as a design parameter may allow development of innovative reaction systems that minimize environmental problems.
Thomas F. Edgar, University of Texas, David A. Dixon, Pacific Northwest National Laboratory, and Gintaris V. Reklaitis, Purdue University, gave a multi-author perspective on the computational needs of the chemical industry. The current forces driving the U.S. chemical industry, such as globalization, requirements for minimization of environmental impact, and the need for improved return on investment, require the expanded use and application of new computational technologies. Forecasted future improvements in process modeling, control, instrumentation, and operations are a major component in the recently completed report Technology Vision 2020: Report of the U.S. Chemical Industry,2 which presents a road map for the next 25 years for the chemical and allied industries. Detailed R&D road maps on specific areas of chemical technology summarized in this presentation were prepared in 1997 and 1998. The areas covered were instrumentation, control, operations, and computational chemistry.
Remote Collaboration and Instruments Online
Raymond A. Bair, Pacific Northwest National Laboratory, emphasized that new technologies for computing and communications offer opportunities to revolutionize not only the scope but also the process of scientific investigation. Bair predicted that a significant contribution of collaboratories will be support for creating and sustaining scientific communities that can interact and share information rapidly to address challenging research problems. Moreover, as the chemical applications and capabilities provided by collaboratories become more familiar, researchers will move significantly beyond current practice to exciting new paradigms for scientific work.
Bair detailed some of the requirements for future success, including development of interdisciplinary partnerships of chemists and computer scientists; flexible and extensible frameworks for collaboratories; means to deploy, support, and evaluate collaboratories in the field; input from the nation's research community to development of the next generation of Interact standards; and higher-performance networks, more scalable and capable network standards, and better network management capabilities.
In concluding, he pointed out the opportunity for competitive advantage provided by collaboratories that give access to expertise, data, experiments, or computations that would not otherwise be available to explore a research question or solve a problem. He also noted the positive impact that collaboratories can have on the complexity and scale of chemical problems considered, as well as the crucial part collaboratories can play in projects that depend on having access to large user facilities and/or multidisciplinary research teams. By removing many barriers of time and distance, collaboratories can enhance the exchange of information in the sciences and can also contribute to the management of costs for travel and equipment use.
Bridget Carragher and Clinton S. Potter, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, discussed their experience with the development of remote and automated access to imaging instrumentation in the World Wide Laboratory (WWL) project. They proposed several ways of using remote-access technology in practice, including for service, collaboration, education and training, remote research, and automated and intelligent control of functions usually performed manually by a local operator. Among the advantages they described for remote-access technology—which is one component of a collaboratory—were opportunities for consultation with experts located anywhere, access to a network of distributed expertise, and unprecedented opportunities for education, training, and access for users at institutions lacking the means to support expensive and unique instruments.
Specific examples they reported on involved remote work with a transmission electron microscope, nuclear magnetic resonance imaging spectrometers, and a video light microscope—all of which are accessible in the WWL through Web-browser-based user interfaces. One K-12 education project, Chickscope, showed that very complex remote-access technology could be used effectively by students at all grade levels and also demonstrated all of the components defined for a working collaboratory.
They concluded by noting that wider acceptance of collaboratories in the general scientific community would require demonstration of their impact in the scientific research environment as well as a systematic evaluation of their contribution to productivity.
Thomas A. Finholt, University of Michigan, started by noting that despite the tremendous growth in the knowledge and practical application of chemical principles, the practice of chemistry research and teaching has remained relatively unchanged. The use of the Internet as a worldwide mechanism for scientific communication challenges this status quo. Innovations such as collaboratories that remove constraints of distance and time on scientific collaboration and increase access to scarce instruments will accelerate the flow of information and place new demands on senior scientists as mentors. Finholt pointed out the need to anticipate and influence the development of emerging Internet technologies that will affect how research is done. He discussed the challenges posed by new ways of conducting research in chemistry in terms of our transition from the past, through the present, and into an uncertain future.
In reviewing "where we come from," Finholt discussed three particularly important innovations: the creation and elaboration of the research laboratory, the use of laboratory classes in chemical education, and the use of lecture demonstrations to illuminate and clarify chemical principles. Briefly describing "what we are," he considered opportunities that are now available to chemists through the expansion of
computer and Internet technologies and that can be thought of in terms of the raw performance of computer processors, the capacity of communication networks, the scope of networks, and the evolution of software. Finally, in exploring "where we are going," he used examples from the present to project possible new modes of teaching and doing research, including the collaboratory, that would enable researchers to perform their work without regard to geographical location—interacting with colleagues, using instruments remotely, sharing data and computational resources, and accessing information in digital libraries. He concluded by cautioning that while Web-based tools may alter the landscape of practice and pedagogy, simply "surfing" for information will not replace learning. To master and understand key concepts, students and researchers must absorb and reflect on ideas and avoid the temptation to browse endlessly among an ever-widening array of online resources.
David R. McLaughlin, Eastman Kodak Company, described how Kodak has used computers and information technology to enhance operations in its research laboratories. The effort has focused on creating an electronic or computerized laboratory and delivering information to the scientist's desktop. To illustrate the impact of Kodak's “wired laboratory," he described four ways that advances in computing technology have helped to increase the efficiency of the analytical chemistry laboratory: the first through automation and simplification of some of the tasks associated with analysis and synthesis, the second in management of information and knowledge, the third in the generation and maintenance of data in electronic (digital) form, and the fourth through data analysis and chemometrics. Examples given of components of the wired laboratory included QUANTUM, an integrated spectroscopy information system; a walk-up spectroscopy laboratory with instruments online; increased capabilities for electronic access to information and analytical data; and WIMS, a Web-based information management system. He also provided information on an electronic laboratory notebook that has been developed to assist with the management of experiments, projects, and programs.
McLaughlin characterized the wired laboratory of the future as one in which all scientists would use an intelligent electronic laboratory notebook linked to all of the data-generating equipment, and in which evolving analytical technology in combination with data analysis techniques would reduce the time required for sample preparation and data interpretation. All of these capabilities would be provided through a common Web interface. He indicated that a challenge to the analytical community is to devise real-time measurements that, when displayed in virtual reality systems, will enable researchers to "see" results and thus better understand them. He also noted that high-quality, reliable software available at reasonable cost is one of the most critical needs for the future.
Chemical Information Online
Gary Mallard, National Institute of Standards and Technology, discussed a variety of issues in the management of chemical data. He suggested that the large growth in publication of information on the Internet is being driven by a reduction in traditional data resources, demand for faster access to data, and increased needs for data for modeling and simulation. The last factor has a particularly strong influence because it has also changed the nature of the data needed. He stressed the importance of quality assurance in data generation and management, including basic data storage and archiving. So that these data will be useful and reliable, critical evaluation must aim to detect errors and inconsistencies in the information that can originate from incomplete data sets, uncertainty in the data, and errors introduced during the compilation process. Mallard stressed the importance of preventing and correcting errors as the scientific community's demands for data continue to increase.
Richard E. Lucier, University of California, described work directed at developing a "digital library" to serve the entire university system in California. He argued that the conventional library—with its archives of traditional research journals—is evolving toward new forms of scholarly communication. In his view traditional libraries, with the services we have come to expect from them, will not continue to be sustainable. Given the information explosion, they simply cost too much. Costs will be controlled by forming large consortia of libraries to leverage buying power. He proposed that comprehensive access to information will replace comprehensive ownership of information, and that world-class libraries will consist of complementary paper and digital holdings. In addressing possible new approaches to scholarly communication, he pointed out the conflicting goals of the current system, which uses publications both as a means to disseminate knowledge and as a mechanism for evaluating the performance of research scholars. New approaches to electronic publication may allow these to be decoupled. Pursuing new approaches will require examination of current copyright policies and practices such as the assignment of rights to publishers.
Lorrin R. Garson, American Chemical Society, presented an analysis of issues related to the emergence of electronic publishing, using the perspective of a scientific society that is already a major publisher in the print medium. He characterized scientific publishing as a field with high costs, diminishing resources for those purchasing publications, competition among publishers, and increasing pressure to publish more material. He noted that both commercial and not-for-profit publishers must strive to operate on a "not-for-loss" basis. He presented arguments that "first-copy" costs account for approximately 80 percent of all publishing costs, regardless of whether paper or electronic distribution is the final result. Consequently, the financial challenges are unlikely to disappear with a move toward electronic publishing.
Garson identified a range of important problems and challenges associated with electronic publishing, including the need for improvements in technology and funding of that investment, assumption of responsibility and costs for archiving of electronic information, terms for and constraints on use of electronic information, and costs of individual subscriptions. He was optimistic about progress in overcoming the technical barriers but indicated that the financial and sociological obstacles are formidable.