Click for next page ( 30


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 29
4 Interfaces: Cooperation and Collaboration Across Disciplines The research and technology enterprise of chemistry and chemical engineer- ing involves a dazzling multitude of intellectual opportunities and technological applications. The community includes academia, industry, and government and independent laboratories, and it embraces the disciplines of chemistry, chemical engineering, and many others that interact in complex ways. Working at the inter- faces between disciplinary domains offers reinforcing synergies and beneficial blurring of boundaries, but may also be accompanied by barriers that hinder the effectiveness of information flow. Information technology (IT) assets are rapidly accepted in the chemical sciences, and they are used today mainly to increase the speed of what we already know how to do. The strategic use of computing re- sources requires deeper integration of chemical engineering and chemistry with IT, a process that is in its early stages. Finding: Computation and information technology provide a key en- abling force for lowering barriers among the disciplines that comprise the chemical enterprise and closely related fields. Identification of mutual interests among disciplines and removal of the bar- riers to successful communication among constituencies are essential for in- creasing the overall effectiveness of the system. The processes of identifica- tion and removal are still in their infancy. Cooperation and collaboration across disciplines expand the intellectual free- dom that people enjoy the freedom to access the tools and capabilities to do one's work. The interface between information technology and the activities of chemists, chemical engineers, and allied disciplines is still in its infancy. The opportunity to build that architecture, which will revolutionize the future of the 29

OCR for page 29
30 INFORMATION AND COMMUNICATION chemical enterprise, is exciting. It is essential that advanced information systems draw on and reinforce the intuitions and creative intellectual patterns that have been so successful in research, discovery, and technology innovation in the chemi- cal enterprise. Moreover, it is critically important to ensure the security of trans- actions that involve sharing of tools, data, and resources in a manner that pro- vides responsibility for privacy, intellectual property, economic growth, and the public trust. Intuition based on physical understanding of the system is an essential com- ponent for integrating computation and information technologies with the chemi- cal sciences. The essential resource driving the interface of IT and the chemical sciences is human ingenuity, and the supply of this resource is unbounded. In- deed, throughout this report, the importance of "people issues" emerges. Coop- eration and collaboration depend on the attitudes of the individuals who partici- pate. An important first step toward cooperation and collaboration is training that nurtures a creative, problem-solving approach. The cross-disciplinary interface between IT and the chemical sciences should be capable of rejecting preconcep- tions about the way things "ought to be," while also providing stable tools that support advances in research and technology. People, not software or data reposi- tories, will recognize the revolutionary breakthrough and how to apply it to the right problem. People will recognize how a suite of interlocking technologies can be combined with revolutionary impact. Finally, it is important to have realistic expectations for success. This is a difficult challenge because what one measures and re- wards is what one gets. There are many ways to impact applica- tions and technology with any level of sophistication in a simulation. Some of the important ways lend themselves only to intangible measures, but oftentimes these may be the most important. Again quoting Einstein, "Not everything that counts can be counted, and not everything that can be counted counts." Ellen Stechel (Appendix D) Research discoveries and innovative technological advances invariably ac- company breakthroughs in computation and information technology. The ad- vances to date in these areas make possible an even faster pace of discovery and innovation. Virtually every one of the discovery and application opportunities mentioned in the previous chapter depends on sustained advances in information technology. The opportunities at the chemical science-IT interface are enormous. Ad- vances in each of the IT areas discussed here will benefit virtually all of the

OCR for page 29
INTERFACES: COOPERATION AND COL~BO^TION ACROSS DISCIPLINES 31 discovery and application areas described. Profound advances will result when developments in the IT area are brought to bear on discovery and applications areas. It is critically important to learn how to build teams that bring the compo- nent disciplinary skills to bear. It is equally important to maintain depth and dis- ciplinary skills in the fundamentals. Clearly the impact gained by solving a key IT challenge in one particular area can be significantly multiplied if it is carried out in an informed manner that benefits related areas. The process of bringing IT developments to bear on discovery areas can serve also to guide new research within the IT discipline. The close interaction of IT experts with science and engineering application experts requires a management approach that is in part maintaining the value systems that cross multiple disci- plines, and in part "enlightened self-interest." The major challenge is to set clear investment priorities, and to maximize the benefits across many applications. The computation-information infrastructure within chemical sciences is var- iegated and effective. New directions, based on both IT advances and the growth of interdisciplinary understanding, offer striking possibilities for research devel- opment, learning, and education. One specific new endeavor, the establishment of integrated, multicomponent Grid-based Collaborative Modeling-Data Envi- ronments (CMDEs), is a promising opportunity for federal investment. OVERARCHING THEMES Four key themes emerge repeatedly at the heart of discussions on integrating information technology with the chemical sciences. These themes provide a framework for relating research areas with the enabling IT technologies. They are (1) targeted design and open-ended discovery two routes to solving complex problems; (2) flow of information between people within and among disciplines- people issues and barriers to solving complex problems; (3) multiscale simula- tion one of the key IT-centric technical approaches to solving complex prob- lems; and (4) collaborative environments integrating IT methodology and tools for doing the work. Appropriate educational initiatives linked with these themes are a major need. Targeted Design and Open-Ended Discovery Targeted Design. Targeted design involves working backward from the de- sired function or specifications of a molecule or product and determining the underlying structure and the process conditions by which it can be fabricated. The general features of the approach have been used successfully in cases for which the need was exceptional and the target was clear: for example, synthetic rubber, AIDS treatment, and replacements for chlorofluorocarbon refrigerants. Extraor- dinary recent advances in computer speed, computational chemistry, process simulation and control, molecular biology, materials, visualization, magnetic stor-

OCR for page 29
32 INFORMATION AND COMMUNICATION age, bioinfomatics, and IT infrastructure make possible a wholly new level of implementation. By building on past investments, targeted design could pay enor- mous dividends by increasing the efficiency of the discovery effort and speeding up the innovation process. The process can also provide fundamental researchers with intuitive insights that can lead to more discoveries, However, a number of questions will need to be addressed as this approach is used more widely: . Structure-Function Relationships: Given the molecular structure, how can its properties be estimated? Given the properties, how can they be related to the desired structure? Combinatorial Search Algorithms: How do we evaluate many alterna- tives to find the ones that meet specifications? Target Identification: What are the criteria for specifying the target? Roles of Participants: How does the technical problem translate into what the participants must do to address the issue successfully? Collaborative Problem Solving: What tools are needed to connect indi- viduals from different disciplines, and how can these be made to reflect the way people work together? Trust and Confidence: Appropriate security is essential if collaborations involve proprietary information or methodology. How can the key elements of trust and confidence be assured? Open-ended Research and Discovery. Open-ended research and discovery has a long tradition and spectacular record of success in the chemical sciences. Curiosity-based discovery has led to research discoveries (such as oxygen, nuclear magnetic resonance, DNA structure, Teflon, and dimensionally stable anodes) that have truly changed the world. The broad-based research investment of the past half century in curiosity-driven research is still paying great dividends in technological developments in such areas as nanoscience and nanotechnology, microelectromechanical systems (MEMS), environmental chemistry, catalysis, and the chemical-biological interface. Such an approach requires: access to information in a way that informs intuition; the flexibility for the individual investigator to modify the approach (an avenue of research in which open-source software can be of tremendous value); and data assimilation and visualization (where better and more efficient algo- rithms for mining data are needed, and where the chemical sciences must learn from the information sciences). Targeted design builds on the foundation of previous curiosity-driven dis- coveries, but its progress also drives the need for new fundamental understand- ing. To view the technological developments that grow from open-ended research

OCR for page 29
INTERFACES: COOPERATION AND COLLABORATION ACROSS DISCIPLINES 33 as a series of coincidences is to miss the point of the past investment strategy of many federal funding agencies. Both are essential and both benefit from advances in information technology. To take full advantage of the discoveries in such areas as chemical biology, nanoscience, environmental chemistry, and catalysis requires both approaches. It is important to distinguish between them: ended. Targeted design does not work if we do not know the target; it is not open Curiosity-driven research rarely hits a technology target on its own with- out guidance; the rate of innovation can be very slow. The common ground between these approaches can be expanded and en- riched by IT advances. The targeted design approach is beginning to be used in the IT community. Examples are available that are built on several decades of real success in com- puter science and in pure and applied mathematics: Very advanced special-purpose computers, such as the Earth Simulator in Japan; its raw speed is high, as is effectiveness of delivered performance. The machine takes advantage of memory-usage patterns in science and engineering applications by way of fast data paths between the central processing unit and main memory (cf. T. Dunning, Appendix D). Algorithmic advances: Today's supercomputers use commercially ori- ented commodity microprocessors and memory that are inexpensive but have slow communications between microprocessor and memory. They try to mini- mize slow access to main memory by placing a fast-cache memory between pro- cessor and main memory. This works well if algorithms can make effective use of cache, but many chemical science and engineering algorithms do not. Cache- friendly algorithms are needed to take full advantage of the new generation of supercomputers . Algorithm development based on understanding of the physical system as well as the computer architecture. . Development of "model" sciences to obtain consistent results without unnecessary computation: This approach is illustrated by the Gaussian-2 (G2) model chemistry developed by Pople' s group.) The use of targeted design to bring IT resources to bear on problems in chemi- cal science and technology is likely to become more widespread. That approach would provide progress for virtually every point identified earlier in this chapter. Many of the applications require capacity computing linked with robust, reli- iCurtiss, L. A.; Raghavachari, K.; Pople, J. A. J. Chem. Phys. 1995,103, 4192.

OCR for page 29
34 INFORMATION AND COMMUNICATION Appropriate Computational Levels I have a comment about your discussion of the convergence with respect to basis set in the computation or calculation of molecular energies. Probably the most notable feature of that is it became very expensive as you moved to the larger and larger basis sets but the last order of magnitude in computing that you in fact spent gave you almost no information, a result differing very little from the pre- vious numbers, and in fact it seems to me that one is in some danger of exaggerating the need for very large computation to achieve chemical accuracy. Many of the results that we have been able to find, we have found can be achieved by the judicious use of small empirical corrections or simple extrapolation schemes. I think that was already evident from your presentation. So I feel there is a slight danger in saying that you can only achieve this high a level of accuracy by using massive resources. John A. Pople, Northwestern University (comments following presentation by Thom Dunning, Appendix DJ able, and accessible problem-solving tools that gain leverage from rapid advances in network bandwidth, storage, and algorithm efficiency. The need for increased capability high-performance computing will also continue to be critically impor- tant for the long term, as larger, more complicated, and more accurate simulations become needed. In most cases, increases in performance are a result of adroit algorithms based on physical understanding of the application in addition to raw processor speed. Flow of Information Between People Within and Among Disciplines Advances in communication and information technology have helped make it possible to address complex systems with unprecedented success. Such sys- tems exhibit nonlinear relationships, large experimental domains, and multiple interactions between many components. Moreover, for most complex systems, the underlying model is generally unknown although some components may be well-characterized. It is therefore necessary to employ various kinds of knowl- edge, theory, insight, experimental data, numerical codes, and other tools. In gen- eral, the flow of information passes among communities that may be organized by skill set (experimentalist, theorist, modeler, programmer, etc.~; institution (academia, industry); discipline (chemistry, chemical engineering, computer sci- ence); or other such broad groupings.

OCR for page 29
INTERFACES: COOPERATION AND COL~BO^TION ACROSS DISCIPLINES 35 Noncontinuum Phenomena , ~ ~ Noncontinuum Model Monte Carlo Molecular Dynamics - Physical System Hypotheses and Experimental Data 1 ~ 1 ]~ Additional Experiments That Test and Exploit Hypotheses: Multiscale Multiphenomena Simulations Design, Control, and Optimization Devices, Products, and Processes | Continuum ~ I Continuum Model Finite Difference Finite Element FIGURE 4-1 Flow of information between science and technology in multiscale, multiphenomena investigations. Figure 4-1 describes some aspects of information flow in the analysis of com- plicated problems. The boxes in the figure represent specialized methods and tools that contribute to the investigation. These include, for example, experimental tech- niques, computer codes, numerical methods, mathematical models of well-charac- terized fundamental scientific systems, and numerical simulations of complex com- ponents of technological systems. Several general observations can be made: The tools used in one box often are not easily used by specialists working in another box. . Robust, reliable methods of remote access are now beginning to emerge: shared equipment (e.g., NMR), data (e.g., bioinfomatics), and codes (e.g., open source). Such beginnings strengthen the flow of information between some of the boxes.

OCR for page 29
36 INFORMATION AND COMMUNICATION Between other boxes, however, barriers still hinder the effectiveness of the information flow as noted in the next paragraphs. For the purpose of discussion, consider a specific starting point: Ensuring product quality at the molecular level requires a new generation of science and engineering tools. This statement applies to many of the areas discussed in this report, from nanoscience to biotechnology to sensors and microelectronic de- vices. The following material considers how this statement translates into needs and barriers between two groups: the technical experts (who create the methods and tools inside the boxes) and the users (who use the methods and tools to solve problems). First consider the roles of different kinds of technical experts. It is abundantly clear that the following characterizations are incomplete and overly simplified, but they nevertheless serve to illustrate some of the salient needs of, and barriers between, different communities of technical experts: . Experimentalists want to produce reliable data. To interpret and validate the data with sophisticated computational codes, they need modeling expertise. To test the data and to make it accessible to others, they need computer science expertise in database structures. Modelers want to create simulation codes that compile and run efficiently. From experimentalists, they need up-to-date mechanisms and data. From com- puter scientists and engineers, they need to understand the language, advancing programming methods, protocols, data management schemes, and computing ar- chitectures, to name a few. . Computer scientists and engineers want to push the envelope of IT capa- bilities, and create structures for efficient workflow and resource utilization. They need metadata from experimentalists in order to characterize the data and facili- tate use of the data by others. They need an understanding of the physical basis of modeling goals, which are invariably the key to improving efficiencies. Figure 4-2 provides a schematic representation in which the three kinds of technical expertise are at the corners of a triangle, and the needs that they have from each other are summarized along the sides. Barriers are shown that block information flow between the experts. Once the barriers are removed, each group will want to change the way it does things in order to take full and strategic advantage of the others. In keeping with a focus on the statement that ensuring product quality at the molecular level requires a new generation of science and engineering tools, the following ex- amples serve as illustrations: Experimentalists will produce data at small scales on well-characterized systems to validate numerical simulations, and also provide measurements at

OCR for page 29
INTERFACES: COOPERATION AND COL~BO^TION ACROSS DISCIPLINES 37 | Experimentalists | I BARRIER | Computer scientists | I and engineers BARRIER ~ Modelers I FIGURE 4-2 Needs and barriers for interactions among technical experts. multiple scales on entire complex systems to characterize sources of uncertainty. They will document error bars on datasets including images so that others can use them with confidence. They will be able to suggest simulations to test or resolve multiple reasonable hypotheses of mechanisms and use up-to-date data from oth- ers in those simulations. . Modelers will produce portable object-oriented codes so others (including nonexperts) can modify the original hypotheses and assumptions. They will com- bine theories at different scales with metadata schema to link continuum and noncontinuum codes so that multiscale simulations can be done routinely. They will be able to draw on different experimental methods and data for improved simulation including parameter sensitivity and estimation, hypothesis selection, lifetime probability, and risk assessment. They will tune code performance based on understanding of computer architecture and algorithm strategies. Computer scientists and engineers will develop robust, reliable, user- friendly Collaborative Modeling-Data Environments for linking codes, comput- ers, research data, field-sensors, and other inputs for use by scientists and engi- neers. They will provide semiautomatic generation of graphical user interfaces so that older codes will continue to be useful for design and synthesis as new com- puting resources become available. In addition, improved security, authentica- tion, authorization, and other measures will be developed that protect privacy and intellectual property associated with commercial applications. The use of web services will be crucial for the utility of this work.

OCR for page 29
38 INFORMATION AND COMMUNICATION In order to reduce barriers several steps must be addressed: . Promote and ensure decentralized autonomy of creative technical experts in the chemical sciences, while facilitating a multidisciplinary team approach to targeted design. Continue the integration of IT with the chemical sciences, which have been quick to accept IT and use it throughout the discipline. . Increase multidisciplinary training and provide financial mechanisms to support IT. Develop modular, portable, and extensible programming methods, includ- ing object-oriented component architectures for portable modeling (i.e., software that can be reused in multiple applications). . Expand use of open-source software for community codes where it is ap- propriate for many individuals to participate in code development. Strengthen the scientific method of testing hypotheses by experiment by lowering barriers to flow of information among modelers, experimentalists, and computer scientists. Optimize the research infrastructure for multidisciplinary team approaches to problems. Develop a permanent, ongoing interaction between the chemical sciences and IT communities to guide investments in infrastructure for the chemical sci- ences. Next consider the roles of users who use the methods and tools developed by technical experts to solve problems. . Users may include other technical experts, industrial technicians and train- ees, students, and learners. Most users will want to input their own variables which may be confi- dential and will want return to their work leaving no fingerprints for others to view. . Such users may have limited interest in technical details, but will need a clear understanding of assumptions and limitations on use as well as robust, reli- able, user-friendly access. . Users will require a feedback mechanism so they can direct the experts to areas where improvements are needed. Finding: Addressing critical challenges at the interfaces with other sci- entific and engineering disciplines will enable chemistry and chemical engineering to contribute even more effectively to the nation's techno- logical and economic progress. The most important challenge involves people. Advances in IT that facilitate

OCR for page 29
INTERFACES: COOPERATION AND COL~BO^TION ACROSS DISCIPLINES 39 self-organization of problem-solving groups with common interests across disciplinary boundaries will impact strongly both understanding-based and application-driven projects. The essential resource driving the interface of IT and the chemical sciences is human ingenuity. Multiscale Simulation After a century of success in understanding the fundamental building blocks and processes that underlie our material world, we now face another, even greater, challenge: assembling information in a way that makes it possible to predict the behavior of complex, real-world systems. We want to predict properties of physi- cal, environmental, and biological systems over extended length and time scales; understand how component processes interact; fill gaps in experimental data with reliable computational predictions of molecular structure and thermodynamic properties; and develop reliable methods for engineering process design and qual- ity control. Here is a scientific frontier for the twenty-first century. With virtually any of the examples emphasized throughout this report, multiscale simulation will play a critical role. Figure 4-3 shows the broad range of time and length scales some 15 orders of magnitude or more that are encountered for one particular field of applica- tion involving electrochemical processing for chip manufacture. The figure is divided into three vertical bands in which different kinds of numerical simulation tools are used: noncontinuum, continuum, and manufacturing scale. Each of the topics indicated in the figure is placed in the approximate position to which it corresponds. For some of the topics, the space and/or time axis may be more or less approximate and appropriate. There is today a sophisticated understanding of each of these topics. Multiscale simulation involves linking the pieces in order to understand interactions within an entire system. Other topics such as drug development and manufacture, environmental remediation, or coatings technology show similar multiscale aspects. A few ex- amples serve to illustrate the very broad range of activities for which multiscale simulations are important: The internal combustion engine involves the engine itself, as well as the fluid dynamics of the fuel-air mixture as it flows through the combustion region and the chemical process of ignition and burning of the fuel, which can involve hundreds of chemical species and thousands of reactions, as well as particulate matter dynamics and the ultimate fate of combustion products through atmo- spheric chemistry. Creating a virtual simulation of a periodic living cell involves a substan- tial challenge. The organelles in such a cell include about 4 million ribosomes. There are about 5 billion proteins drawn from 5000-10,000 different species, a meter of DNA with several billion bases, 60 million transfer RNAs, and vast

OCR for page 29
40 bore jog Doe Oh 1 03 En ~ 10-9 10 10-3 10-6 o-1 2 o-1 5 INFORMATION AND COMMUNICATION r r r r Noncontinuum Models Continuum Models Double ~ P gY Layer Interconnects Surface Films Additives Monolayers Electron Transfer Boundary Layers Fab Facilities Deposition Tools Potential Field Current Distribution : ~ by< Manufacturir~ Wafer . - - 10-1410-1210-1010-8 10-6 10-4 10-2 10 102 10 LENGTH SCALE (m) FIGURE 4-3 Schematic of the time and length scales encountered in multiscale simula- tions involving electrochemical processing for chip manufacture. numbers of chemical pathways that are tightly coupled through various feedback loops in which the proteins are turning the genes on and off in the regulation network. All of the relevant detail would have to be captured. The design and degradation of materials involves multiple scales from atoms to processes, as well as stress and environmental degradation from corro- sion, wear, fatigue, crack propagation, and failure. Multiscale simulation involves the use of distinct methods appropriate for different length and time scales that are applied simultaneously to achieve a com- prehensive description of a system. Figure 4-4 illustrates some of the computa- tional methods that have been developed over many decades in order to deal with phenomena at different time and length scales to compute properties and model phenomena. These include quantum mechanics for accurate calculation of small

OCR for page 29
INTERFACES: COOPERATION AND COL~BO^TION ACROSS DISCIPLINES 41 Microscale Design Issues The major development that is driving change, at least in my world, is the revolution at the micro scale. Many people are working in this area now, and many think it will be as big as the computer revolution was. Of particular importance, the behavior of fluid flow near the walls and the boundaries becomes critical in such small devices, many of which are built for biological applications. We have large molecules moving through small spaces, which amounts to moving discrete molecules through devices. The models will often be discrete or stochastic, rather than continuous and deterministic- a fundamental change in the kind of mathematics and the kind of software that must be developed to handle those problems. For the engineering side, interaction with the macro scale world is always going to be important, and it will drive the multiscale issues. Important phenomena occurring at the micro scale determine the behavior of devices, but at the same time, we have to find a way to interact with those devices in a meaningful way. All that must hap- pen in a simulation. Linda Petzold (Appendix D) and fast phenomena, statistical mechanics and semiempirical modeling for mechanistic understanding, the mesoscopic scale where new methods are now beginning to emerge for multiscale modeling, continuum mechanics for macro- scopic reaction and transport modeling, process simulation for characterizing and optimizing entire process units and their interconnection, and supply chain mod- eling that integrates over multiple plants, customers, and global logistics. In many cases, modeling and simulation techniques that were originally developed in other communities, such as computational physics and materials science, are now be- ing used in the chemical sciences, and vice versa. Specific examples can be found in Appendix D in presentations by de Pablo, Dunning, Maroudas, and Petzold. Ultimately what one would like to do from an engineering viewpoint would be to use all of these methodologies to explore vast regions of parameter space, identify key phenomena, promote these phenomena that lead to good behavior, avoid the phenomena that lead to failure, and so on. The range of scales may extend from quantum chemistry to atomistic and molecular modeling to mesoscopic to continuum macroscopic mechanics to large-scale integration of processes. Progress over the past decade at each of these scales suggests that it may now be possible to combine methods from different scales to achieve a com- prehensive description of a system. Such techniques will address important issues

OCR for page 29
42 INFORMATION AND COMMUNICATION Supply Chain Modeling: Planning & Scheduling Mixed Integer Linear Programming Global logistics Length Process Simulat ion: ~ Equation-Based Models 0 Control and Optimization ~ Processing units/facilities Continuum Mechanics: Finite-Element & -Difference Methods Boundary-lntegral Methods ~ Macroscopic modeling Mesoscopic Sc, ale. Coarse-grained Quantum and Statistical Mechanics Mixed/coupled atomistic-continuum Methods ~ Mesoscale modeling Statistical Mechanics: Semi-empirical Hamiltonians Molecular Statics, Lattice Dynamics, Molecular Dynamics, Monte Carlo Modeling for mechanistic understanding Quantum Mechanics: 0 Ab initio, electronic structure Density Functional Theory, first principles Molecular Dynamics _ Accurate calculations of materials properties Time . FIGURE 4-4 Modeling elements and core capabilities. that include high-throughput, combinatorial modeling and high-dimensional de- sign space. The simulation of systems that involve such broad ranges of length and time scales naturally requires a multidisciplinary approach. The current generation of multiscale modeling activity represents the beginnings of a new approach for assembling information from many sources to describe the behavior of complicated systems. The current methodology will need steady improve- ment over the next decade. For any multiscale model, what is missing in the early stages of development is less important. What is important is what is not missing. Multiscale modeling is a kind of targeted design activity, and it requires physical intuition to decide what does not matter. These examples point to several critical areas where advances are needed to provide improved multiscale capabilities: Computing Power: The underlying progress of the past decade has re- sulted in part from the increasing power of computers and from important theo- retical and algorithmic developments. Bringing these together for commodity computing places significant demands on the information infrastructure. Computers: Multiple architectures will be needed for different types of calculations. Major gains in the chemical sciences will require accuracy and an understanding of the error bounds on the calculations. High-end computing will increasingly serve to validate simpler approaches.

OCR for page 29
INTERFACES: COOPERATION AND COL~BO^TION ACROSS DISCIPLINES 43 Formalisms: Actual multiscale models are nearly always based on ap- proximate separations of length and time scales (e.g., Langevin equations for coarse "raining in time, or quantum mechanics-molecular mechanics algorithms for large systems). Formal approaches are needed for understanding and improv- ing such approximations. Software: Investments in software are needed, especially in the area of automated programs for software development and maintenance. For multiscale simulations, component-based methods will be needed to develop simulations of entire systems. The funding portfolio for algorithm development should include application-driven projects that can establish the scientific and technological foun- dations necessary for a software component industry. . Interoperability: Protocols and data structures are needed to achieve interoperability of applications that run concurrently. . computers. Web Services: Programs and associated data that can be accessed by other Computational steering: The modeler should be able to interact with a simulation in real time to steer it towards a particular goal. This will require both algorithmic and hardware advances. Uncertainty and Reliability: Methods are needed for assessing sources of error in components and their aggregation into multiscale systems, for example, systems that combine computational results with experimental data and images. Accessibility: Object-oriented or other portable component architecture is needed for models that can be modified and reused by others. . Standardization: Experts need certification standards for producing ro- bust, reliable components that have a clear description of approximations and limits so that nonexperts can use them with confidence. People: A robust supply of workers with multidisciplinary training will be needed in response to increasing demand. Many of the processes of interest in chemistry and chemical engi- neering occur on much longer time scales (e.g., minutes or hours); it is unlikely that the several orders of magnitude that now separate our needs from what is possible with atomistic-level methods will be bridged by the availability of faster computers. It is therefore neces- sary to develop theoretical and computational methods to establish a systematic connection between atomistic and macroscopic time scales. These techniques are often referred to as multiscale meth- ods or coarse-graining methods. Juan de Pablo (Appendix D)

OCR for page 29
44 INFORMATION AND COMMUNICATION Collaborative Environments The most difficult challenges in advancing human welfare through knowl- edge of the chemical sciences involve a level of complexity that can only now begin to be addressed through information technology. Such efforts are inher- ently multidisciplinary and involve working in teams to assemble information and solve problems. New methods for collaborative discovery and problem solv- ing in chemical science and technology are needed so that individual researchers can participate in distributed collective action and communicate effectively be- yond their discipline. These methods should be a significant component of educa- tion in the chemical sciences. Prototype collaborative environments are emerging, with examples such as simple web-based browsers that provide access to applications. The environments extend to more sophisticated demonstrations of Grid-based services that utilize advanced middleware to couple, manage, and access simulation codes, experi- mental data, and advanced tools, including remote computers. Current early pro- totype examples include the DOE Science Grid, EuroGrid (European Union), UNICORE (German Federal Ministry for Education and Research), Information Power Grid (NASA), TeraGrid Alliance Portal (NSF/NCSA), PUNCH (Purdue), and many other projects, applications, and libraries including some commercial activities. Such environments promise an integrating structure for composing sub- stantial applications that execute on the Grid. They offer great promise for shar- ing codes, data, and expertise, as well as for linking the pieces and keeping them up to date. Many of the prototype environments utilize complex Grid-based pro- tocols that are still in early stages of development or consist of demonstrations that are not easily reused or modified. If these prototypes are to fulfill their prom- ise for effective problem solving, pioneering chemical scientists and engineers must be brought into close working relationship with computer scientists and engineers to develop and tune the architecture. Infrastructure expenditures in in- formation technology will be investments only if they can eventually be used for solving problems faced by society. To support growing demand for and dependence on information infrastruc- ture for applications in the chemical science and technology, advances are needed in many different dimensions: . Software: For collaborations, there will be increasing needs for user- friendly and robust software that provides assured standards for quality, reliabil- ity, access, interoperability, and sustained maintenance. Standards are needed to facilitate collaborative interactions between applications personnel who are not experts in computer science. Use of software to upgrade legacy codes (or dusty decks) codes that have been around a while (10-15 years), usually written by others who are no longer available, run on current-generation computers and sys- tems for which it was not optimized, and therefore difficult to modify and de-

OCR for page 29
INTERFACES: COOPERATION AND COL~BO^TION ACROSS DISCIPLINES 45 bug will be increasingly important for sustaining the effective lifetime of col- laborative tools so as to recover a larger portion of the initial investment. Computers: Because the range of applications and collaborative dimen- sions is large, access to a wide range of computing platforms is essential for the chemical sciences. In the past, federal investments in high-end computing and net- working have provided a successful path for pioneering and eventually developing capabilities for a very broad-based user community. As the pace of change quick- ens in the future, it will become increasingly important to maintain even closer ties between pioneering efforts and their rapid dispersal in broad user communities. Simulation Codes: Today, the portfolio of commercial, open-source, and other specialty simulation codes constitutes a rich but unconnected set of tools. While some commercial and open-source codes are user-friendly, many specialty codes are idiosyncratic and excessively fragile. Advanced techniques, such as ob- ject-oriented programming, are needed to facilitate code reuse by individuals other than the creators. Working in collaborative groups, experimental scientists will want to modify codes (to test alternative hypotheses, assumptions, parameters, etc.) with- out the need to recompile. In addition, technologists will want to explore operating conditions and design variables, as well as scientific hypotheses. Data and Knowledge Management: Hypotheses often continue to influ- ence decisions even after being proven wrong. Legacy simulations too often operate on the basis of disproved or speculative hypotheses rather than on recent definitive data. The chemical enterprise has created extensive data crypts of flat files (such as Chemical Abstracts). It now needs to develop data-rich environ- ments (such as fully normalized databases) to allow new cross-disciplinary re- search and understanding. Within certain industries, such capabilities already ex- ist, at least in early form. An example is offered by the pharmaceutical industry, where informatics hubs integrate biological, chemical, and clinical data in a single available environment. The science and technology enterprise needs a way to manage data such that it is relatively easy to determine what the community does and does not know, as well as to identify the assumptions underlying current knowledge. . Uncertainty and Risk: In collaborative efforts it is important to assess sources of error in both experimental and numerical data in order to establish confidence in the conclusions reached. Managing the uncertainty with which data are known will reduce the risk of reaching unwarranted conclusions in compli- cated systems. Improved, easy-to-use risk assessment tools will assist in identify- ing the weakest link in collaborative work and help steer efforts toward develop- ing improvements where they are most needed. Multiscale Simulation: Many collaborative efforts involve multiscale simulations, and the comments about uncertainty and risk apply here as well. Quality: Models and data inform intuition but rarely persuade individuals that they are wrong. Careful design of Collaborative Modeling-Data Environ- ments to facilitate appropriate access to codes, data, visualization tools, com-

OCR for page 29
46 INFORMATION AND COMMUNICATION puters, and other information resources (including optimal use of web services)- will enable participants to test each other's contributions. . Security: Industrial users in the chemical sciences require security in vari- ous ways, such as avoiding unwanted fingerprints on database searches, prevent- ing inappropriate access through firewalls, and providing confidence that codes work and that data are accurate. Collaborative environments must facilitate open inquiry about fundamental understanding, as well as protect intellectual property and privacy while reducing risk and liability. Economic models differ among various user and creator communities for software and middleware from open- source, to large commercial codes, legacy codes behind firewalls, and fleet-footed and accurate specialty codes that may, however, be excessively fragile. The secu- rity requirements of the industrial chemical technologies thus represent both a challenge and an opportunity for developing or marketing software and codes. In multidisciplinary collaborations, values including dogmatic viewpoints estab- lished in one community often do not map onto other disciplines and cultures. People: Collaborations are inherently multidisciplinary, and the point has been made throughout this report that the pool of trained individuals is insuffi- cient to support the growing opportunities. Collaborations bring many points of view to problem solving and offer the promise of identifying barriers rapidly and finding new unanticipated solutions quickly. Anticipated benefits include the following: . bringing in simulation at the beginning of complex experiments to help generate hypotheses, eliminate dead ends, and avoid repeated failures from a trial- and-error approach; arriving at approximate answers quickly with a realistic quantification of uncertainty so that one has immediate impact, deferring highly accurate answers to a later time when they are needed; bringing in data at the beginning of simula- tion efforts allows simple calculations to help frame questions correctly and as- sure that subsequent detailed experiments are unambiguous; and attaining new capability for dealing with the realistic, not overly ideal- ized, problems at various levels of sophistication; some problems lend themselves only to intangible measures, but often these may be the most important ap- proaches. Federal support for one or more Collaborative Modeling-Data Environments could impact tremendously the value of IT for the chemical community. These would be ideal structures for advancing learning, research, insight, and develop- ment on major issues confronting both the chemical community and the larger soci- ety. The CMDEs can offer advantages of scale and integration for solving compli- cated problems from reaction modeling to corrosion. These are also uniquely

OCR for page 29
INTERFACES: COOPERATION AND COL~BO^TION ACROSS DISCIPLINES 47 promising for computational tools and methods development, for developing inte- grated Grid-based modeling capabilities, for building appropriate metaprograms for general use, and for providing powerful computational/understanding structures for chemical professionals, students, and the larger community. EDUCATION AND TRAINING For the chemical sciences to be able to train and enable people in optimal fashion, we must understand and employ information and communications in a far more sophisticated and integrated fashion than we have done to date. This use of information and communications as an enabler of people as a developer of human resources is a great challenge to the chemical sciences in the area of information and communications. In broad terms, information and communications methods should permit the chemical scientist to create, devise, think, test, and develop in entirely new ways. The interactions and cross-fertilization provided by these methods will lead to new materials and products obtained through molecular science and engineering. They will lead to understanding and control of such complex issues as combus- tion chemistry and clean air, aqueous chemistry and the water resources of the world, protein chemistry and drug design, and photovoltaics and hydrogen fuel cells for clean energy and reduced petroleum dependence. They can help to de- sign and build structures from microelectronic devices to greener chemical plants and from artificial cells and organs to a sustainable environment. Finding: The capability to explore in the virtual world will enable soci- ety to become better educated and informed about the chemical sciences. Conveying the intellectual depth, centrality, societal benefits, and creative challenges of molecular systems will be greatly facilitated by the use of modeling, visualization, data manipulation, and real-time responses. All of these new capabilities will provide unparalleled environments for learning, understanding, and creating new knowledge. Finding: The growing dependence of the chemical enterprise on use of information technology requires that chemical professionals have exten- sive education and training in modern IT methods. This training should include data structures, software design, and graphics. Because data and its use comprise such important aspects of chemistry and chemical engineering, and because appropriate use of IT resources can em- power unprecedented advances in the chemical arena, it is crucial that the appropriate training, at all levels, be a part of chemical education. The Collaborative Modeling-Data Environments could both provide these capabilities and educate people from 8 to 88. The proposed CMDEs would allow

OCR for page 29
48 INFORMATION AND COMMUNICATION unique learning by permitting students to use different parts of the metaprogram' s capabilities as their understandings and abilities advance. This could give stu- dents specific problem-solving skills and knowledge ("I can use that utility") that are portable, valuable, and shared. All of these new capabilities would provide unparalleled environments for learning, understanding, and creating.