Click for next page ( 50


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 49
s Infrastructure: Capabilities and Goals Remarkable advances in information technologies computer speed, algo- rithm power, data storage, and network bandwidth have led to a new era of capabilities that range from computational models of molecular processes to re- mote use of one-of-a-kind instruments, shared data repositories, and distributed collaborations. At the current pace of change, an order-of-magnitude increase in computing and communications capability will occur every five years. Advances in information technology (IT) allow us to carry out tasks better and faster; in addition, these sustained rapid advances create revolutionary opportunities. We are still at the early stages of taking strategic advantage of the full potential of- fered by scientific computing and information technology in ways that benefit both academic science and industry. Investments in improving chemical-based understanding and decision making will have a high impact because chemical science and engineering are at the foundation of a broad spectrum of technologi- cal and biological processes. If the United States is to maintain and strengthen its position as a world leader, chemical science and technology will have to aggres- sively pursue the opportunities offered by the advances in information and com- munication technologies. At the intersection of information technology and the chemical sciences there are infrastructural challenges and opportunities. There are needs for infrastruc- ture improvements that could enable chemical scientists and engineers to attain wholly new levels of computing-related research and education and demonstrate the value of these activities to society. These needs extend from research and teaching in the chemical sciences to issues associated with codes, software, data and storage, and networking and bandwidth. Some things are currently working very well at the interface of computing 49

OCR for page 49
so INFORMATION AND COMMUNICATION and the chemical sciences. Networking and Internet high-speed connectivity have been integrated into the chemical sciences, changing the landscape of these fields and of computational chemistry. Commercial computational chemistry software companies and some academic centers provide and maintain computational and modeling codes that are widely used to solve problems in industry and academia. However, these companies and centers do not, and probably cannot, provide the infrastructure required for the development of new scientific approaches and codes for a research market that is deeply segmented. The development of new codes and applications by academia represents a mechanism for continuous inno- vation that drives the field and helps to direct the choice of application areas on which the power of computational chemistry and simulation is brought to bear. Modern algorithms and programming tools have speeded new code development and eased prototyping worries, but creating the complicated codes typical of chemical science and engineering applications remains an exceedingly difficult and time-consuming task. Defining new codes and applications is potentially a growth area of high value but one that faces major infrastructure implications if it is to be sustained. Successful collaborations between chemists and chemical engineers, as well as broadly structured interdisciplinary groups in general, have grown rapidly dur- ing the past decade. These have created the demand for infrastructure develop- ment to solve important problems and new applications in ways never before envisioned. The current infrastructure must be improved if it is to be used effec- tively in interdisciplinary team efforts, especially for realizing the major potential impact of multiscale simulations. Infrastructure developments that support im- proved multidisciplinary interactions include resources for code development, assessment, and life-cycle maintenance; computers designed for science and en- gineering applications; and software for data collection, information management, visualization, and analysis. Such issues must be addressed broadly in the way that funding investments are made in infrastructure, as well as in cross-disciplinary education and in the academic reward structure. The overarching infrastructure challenge is to provide at all times the needed accessibility, standardization and integration across platforms while also provid- ing the fluidity needed to adapt to new concurrent advances in a time of rapid . . nnovahon. RESEARCH Significant gains in understanding and predictive ability are envisioned to result from the development of multiscale simulation methods for the investiga- tion of complicated systems that encompass behavior over wide ranges of time and length scales. Such systems usually require a multidisciplinary approach. Often, multiscale simulations involve multiple phenomena that occur simulta- neously with complex, subtle interactions that can confound intuition. While much

OCR for page 49
INFRASTRUCTURE: CAPABILITIES AND GOALS 51 is known about simulating aspects of behavior at individual scales (e.g., ah initio, stochastic, continuum, and supply-chain calculations), integration across scales is essential for understanding the behavior of entire systems. A critical component in achieving the benefit implied by multiscale model- ing will be funding for interdisciplinary research for which effective, collabora- tive web-based tools are required. The integration of computational results with experimental information is often necessary to solve multiscale problems. In some instances, creating opportunities to access shared equipment will be as critical as access to shared computers or software. Especially important is the ability to represent and understand the uncertainties, not only in the underlying scientific understanding, but also in experimental data that may come from extremely het- erogeneous sources. The infrastructure to achieve these research goals must in- clude definition of standard test cases for software and experiments. Basic infrastructure needs include high-bandwidth access to high-perfor- mance computational facilities, further increased network and bus speed, diverse computer architectures, shared instruments, software, federated databases, stor- age, analysis, and visualization. Computers designed with a knowledge of the memory usage patterns of science and engineering problems will be useful, as will algorithms that take full advantage of the new generation of supercomputers. Continuation of the successful trend towards clusters of commodity computers may result in further opportunities for improved computational efficiency and cost effectiveness. Software should be characterized by interoperability and port- ability so that codes and computers can talk to each other and can be moved in a seamless manner to new systems when they become available. EDUCATION The need for student learning in basic mathematics at the intersection of computing and the chemical sciences is essential because it provides the founda- tion for computational chemistry, modeling, and simulation as well as associated software engineering. Although many entry-level students in the chemical sci- ences are familiar with the use of computers and programs, they often have little or no understanding of the concepts and design principles underlying their use. The integration of these topics in interdisciplinary courses is essential for the development of a skilled workforce.i Educational activities will require the in- vestment of time and resources to develop new content in the curriculum for chemists and chemical engineers. New pedagogical approaches at both the under- graduate and graduate levels will be needed to address subjects at the interface of disciplines linked by scientific data, programming, and applications areas. Train- ing students to adopt a problem-solving approach is critically important for good iBuilding a Workforce for the Information Economy, National Research Council, National Acad- emy Press, Washington, DC, 2001.

OCR for page 49
52 INFORMATION AND COMMUNICATION software engineering and especially for producing codes and data structures that are useful to other people. A national community of educational open-source software would help speed development of training tools. Just as training in mathematics and physics has been needed for work in chemical sciences and engineering, so will specific education in the use of mod- ern IT tools, software design, and data structures be needed by the chemical pro- fessional of the twenty-first century. Such education will help in the rapid devel- opment of new approaches, cross-disciplinary integration, and integrated data handling and utilization. Interdisciplinary research and development at the IT-chemical science inter- face are areas of great excitement and opportunity. Nevertheless, people trained to carry out such projects are in short supply. The continued capability of indi- viduals requires both deep competence and the ability to interact across disci- plines. The emphasis in graduate training therefore must be balanced between specialization within a discipline and cross-disciplinary collaboration and team- work. Transfer of information between fields remains difficult when evaluating performance, particularly for tenure and promotion of faculty who focus on inter- disciplinary projects or hold joint appointments in multiple departments. Such evaluation of scholarship will require attentive administrative coordination to re- solve cultural differences. Creating high-quality educational programs to train people to work at interdisciplinary interfaces is currently a rate-limiting step in the growth of the field. Recognizing and rewarding the success of interdiscipli- nary scientists at different stages in their careers is becoming critically important for the sustained development of the field. Computational chemistry and simulation methods should be incorporated into a broad range of educational programs to provide better understanding of the scope and limitations of various methods, as well as to facilitate their application over the full range of interdisciplinary problems to which they apply. Both sci- ence and engineering applications have to be addressed, since these can have different goals and methods of pursuit with widely differing levels of sophisti- cation. These include simple applications that can be helpful in early stages, com- plicated applications that require greater skill, and applications to truly complex nonlinear systems that represent the current focus of many experts in the field. Such training will benefit industry, where there is a need for computational spe- cialists who understand the goals and objectives of a broad interdisciplinary prob- lem and know how and when computational chemistry and systems-level model- ing can provide added value. In academia, infrastructure support to facilitate better communication and interaction between chemists and chemical engineers will enhance the training of computational experts of the future. The field will be well served by establishing commonality in understanding and language between the creators and users of codes as well as the skilled computer science and engineer- ing nonusers who develop the IT methods.

OCR for page 49
INFRASTRUCTURE: CAPABILITIES AND GOALS 53 An increasingly important part of the infrastructure will be the skilled work- ers who maintain codes, software, and databases over their life cycle. The wide variety of tasks that require sustained management may necessitate a combina- tion of local (funded through research grants) and national (funded through center grants) support to address the overall portfolio of needs. Advances in the chemical sciences have permitted major advances in medi- cine, life science, earth science, physics and engineering, and environmental sci- ence, to name a few. Advances in productivity, quality of life, security, and eco- nomic vitality of global and American society have flowed directly from the efforts of people who work in these fields. Looking to the future, we need to build on these advances so that computational discovery and design can become stan- dard components of broad education and training goals in our society. In this way, the human resources will be available to create, as well as to realize and embrace, the capabilities, challenges, and opportunities provided by the chemical sciences through advanced information technology. Information and communication, data and informatics, and modeling and computing must become primary training goals for researchers in chemical sci- ence. These skills have to be accessible to effectively serve others in the soci- ety from doctors to druggists, ecologists to farmers, and journalists to decision makers who need an awareness of chemical phenomena to work effectively and to make wise decisions. Such skills provide liberating capabilities that enable interactions among people and facilitate new modes of thought, totally new capa- bilities for problem-solving, and new ways to extend the vision of the chemical profession and of the society it serves. CODES, SOFTWARE, DATA AND BANDWIDTH A critical issue for codes, software, and databases is maintenance over the life cycle of use. In the academic world, software with much potential utility can be lost with the graduation of students who develop the codes. Moreover, as codes become more complicated, the educational value of writing one's own code must be balanced against the nontrivial effort to move from a complicated idea, to algorithm, and then to code. Increasing fractions of student researchers are tend- ing to develop skills with simpler practice codes, and then to work with and modify legacy codes that are passed down. Yet at the same time, working in a big coding environment with codes written by people who have long gone is difficult and often frustrating. Development of software that uses source-code generation to automatically fix dusty decks will be increasingly important for decreasing the time and effort associated with extending the useful life of codes. Also, the devel- opment of semiautomatic methods for generation of improved graphical user in- terfaces will reduce a significant barrier to sustaining the use of older code. A1- though the open-source approach works well for communities where thousands

OCR for page 49
54 INFORMATION AND COMMUNICATION of eyes help remove bugs, it is unable to accommodate certain applications for example, when proprietary information is involved. Growth in multiscale simulation may be expected to drive development of improved tools for integration of different software systems, integration of differ- ent hardware architectures, and use of shared code by distributed collaborators. An increasing need will steadily result for improved interoperability and portabil- ity and for users to be able to modify codes created by others. Advances in object- oriented programming and component technology will help. Examples such as the Portable, Extensible Toolkit for Scientific Computing (PETSc) Library at Argonne National Laboratory represent the kind of infrastructure that will sup- port growth in strategic directions. Central to the vision of a continuously evolving code resource for new appli- cations is the ability to build on existing concepts and codes that have been exten- sively developed. However, at present, academic code sharing and support mecha- nisms can at best be described as poor sometimes as a result of perceived commercialization potential or competitive advantage. Moreover, code develop- ment and support are not explicitly supported by most research grants, nor is maintenance of legacy codes. Consequently, adapting academic codes from else- where may generate a risk that the code will become unsupported during its use- ful life cycle. Avoiding this risk results in continual duplication of effort to produce trivial codes that could be better served by open-source toolkits and li- braries maintained as part of the infrastructure. The assurance of code verification, reliability, standardization, availability, maintenance, and security represents an infrastructure issue with broad implica- tions. Sometimes commercial software has established a strong technical base, excellent interfaces, and user-friendly approaches that attract a wide range of users. Commercial software can be valuable when market forces result in con- tinuous improvements that are introduced in a seamless manner, but generally, commercial code development is not well matched to the needs of small groups of research experts nor to many large potential markets of nonexperts. Therefore a critical need exists to establish standards and responsibilities for code. The rapid growth of data storage per unit cost has been accompanied by equally significant increases in the demand for data, with the result that there is rapid increase in emphasis on data issues across chemical science and engineer- ing. Bioinformatics and pharmaceutical database mining represent areas in which sophisticated methods have been effective in extracting useful knowledge from data. Newly emerging applications include scientific measurements, sensors in the environment, process-engineering data, manufacturing execution, and sup- ply-chain systems. The overall goal is to build data repositories that can be ac- cessed easily by remote computers to facilitate the use of shared data among creative laboratory scientists, plant engineers, process-control systems, business managers, and decision makers. Achieving this requires improved procedures that provide interoperability and data-exchange standards.

OCR for page 49
INFRASTRUCTURE: CAPABILITIES AND GOALS 55 The integration of federated databases with predictive modeling and simula- tion tools represents an important opportunity for major advances in the effective use of massive amounts of data. The framework will need to include computa- tional tools, evaluated experimental data, active databases, and knowledge-based software guides for generating chemical and physical property data on demand with quantitative measures of uncertainty. The approach has to provide validated, predictive simulation methods for complicated systems with seamless multiscale and multidisciplinary integration to predict properties and to model physical phe- nomena and processes. The results must be in a form that can be visualized and used by even a nonexpert. In addition to the insightful use of existing data, the acquisition of new chemi- cal and physical property data continues to grow in importance as does the need to retrieve data for future needs. Such efforts require careful experimental mea- surements as well as skilled evaluation of related data from multiple sources. It will be necessary to assess confidence with robust uncertainty estimates; validate data with experimentally or calculated benchmark data of known accuracy; and document the metadata needed for interpretation. There is a need to advance IT systems to provide scientific data and available bandwidth in the public arena. High-quality data represent the foundation upon which public and proprietary institutions can develop their knowledge-manage- ment and predictive modeling systems. It is appropriate that federal agencies par- ticipate in the growing number of data issues that are facing the chemical science and engineering community including policy issues associated with access to data. Improved access to data not only will benefit research and technology but will provide policy and decision makers with superior insights on chemical data- centric matters such as environmental policy, natural resource utilization, and management of unnatural substances. Expanded bandwidth is crucial for collabo- rations, data flow and management, and shared computing resources. You might ask "What is the twenty-first century Grid infrastructure that is emerging?" I would answer that it is this tightly optically coupled set of data clusters for computing and visualization tied together in a collaborative middle layer.... So, if you thought you had seen an explosion on the Internet, you really haven't seen any- thing yet. Larry Smarr (Appendix D) ANTICIPATED BENEFITS OF INVESTMENT IN INFRASTRUCTURE Chemical science and engineering serve major sectors that, in turn, have a wide range of expectations from infrastructure investments. At the heart of these

OCR for page 49
56 INFORMATION AND COMMUNICATION is the development and informed use of data and simulation tools. The use of information technology to facilitate multidisciplinary teams that collaborate on large problems is in its infancy. Sustained investment in information technologies that facilitate the process of discovery and technological innovation holds truly significant promise, and the chemical sciences provide a large number of poten- tial testbeds for the development of such capabilities. In science and engineering research, the complex areas identified in Chapter 4 are clear points of entry for computer science, engineering, and applied math- ematics along with chemical science and engineering. One of the great values of simulation is the insight it gives into the inner relationships of complicated sys- tems as well as the influence this insight has on the resulting outcome. The key enabling infrastructure elements are those that enhance the new intuitions and insights that are the first steps toward discovery. The advances being made in Grid technologies and virtual labora- tories will enhance our ability to access and use computers, chemi- cal data, and first-of-a-kind or one-of-a-kind instruments to advance chemical science and technology. Grid technologies will substan- tially reduce the barrier to using computational models to investi- gate chemical phenomena and to integrating data from various sources into the models or investigations. Virtual laboratories have already proven to be an effective means of dealing with the rising costs of forefront instruments for chemical research by providing capabilities needed by researchers not co-located with the instruments all we need is a sponsor willing to push this technology forward on behalf of the user community. The twenty-first Century will indeed be an exciting time for chemi- cal science and technology. Thom Dunning (Appendix D) In industrial applications, tools are needed that speed targeted design and impact business outcomes through efficient movement from discovery to techno- logical application. Valuing IT infrastructure tools requires recognizing how they enhance productivity, facilitate teamwork, and speed time-consuming experimen- tal work. Finding: Federal research support for individual investigators and for curiosity-driven research is crucial for advances in basic theory, formal- isms, methods, applications, and understanding. History shows that the investment in long-term, high-risk research in the chemical sciences must be maintained to ensure continued R&D progress that provides the nation's technological and economic well-being. Large- scale, large-group efforts are complementary to individual investigator

OCR for page 49
INFRASTRUCTURE: CAPABILITIES AND GOALS 57 Computer-Aided Design of Pharmaceuticals Computer-aided molecular design in the pharmaceutical industry is an application area that has evolved over the past several de- cades. Documentation of success in the pharmaceutical discovery process now transcends reports of individual applications of various techniques that have been used in a specific drug discovery pro- gram. The chemistry concepts of molecular size, shape, and proper- ties and their influence on molecular recognition by receptors of complementary size, shape, and properties are central unifying con- cepts for the industry. These concepts and computational chemistry visualization tools are now used at will and without hesitation by virtually all participants, regardless of their core discipline (chemis- try, biology, marketing, business, management). Such ubiquitous use of simple chemical concepts is an exceedingly reliable indicator of their influence and acceptance within an industry. The concepts that unify thinking in the pharmaceutical discovery field seemingly derive little from the complexity and rigor of the underlying computa- tional chemistry techniques. Nevertheless, there is little reason to assume that these simple concepts could ever have assumed a central role without the support of computational chemistry founda- tions. In other words, having a good idea in science, or in industry, does not mean that anyone will agree with you (much less act on it) unless there is a foundation upon which to build. Organizing Committee projects both are crucial, and both are critically dependent on next-genera- tion IT infrastructure. Finding: A strong infrastructure at the intersection with information technology will be critical for the success of the nation's research invest- ment in chemical science and technology. The infrastructure includes hardware, computing facilities, research support, communications links, and educational structures. Infrastructure enhance- ments will provide substantial advantages in the pursuit of teaching, research, and development. Chemists and chemical engineers will need to be ready to take full advantage of capabilities that are increasing exponentially. To accomplish this we must do the following: Recognize that significant investments in infrastructure will be necessary for progress. Enhance training throughout the educational system (elementary through

OCR for page 49
58 INFORMATION AND COMMUNICATION postgraduate) for computational approaches to the physical world. Assuring that chemists and chemical engineers have adequate training in information technol- ogy is crucial. Programming languages have been the traditional focus of such education; data structures, graphics, and software design are at least as important and should be an integral component (along with such traditional fundamental enablers as mathematics, physics, and biology) of the education of all workers in chemistry and chemical engineering. Maintain national computing laboratories with staff to support research users in a manner analogous to that for other user facilities.2 Develop a mechanism to establish standards and responsibilities for veri fication, standardization, availability, maintenance, and security of codes. Define appropriate roles for developers of academic or commercial soft- ware throughout its life cycle. . . Provide universal availability of reliable and verified software. The findings and recommendations outlined here and in previous chapters show that the intersection of chemistry and chemical engineering with computing and information technology is a sector that is ripe with opportunity. Important accom- plishments have already been realized, and major technical progress should be ex- pected if new and existing resources are optimized in support of research, educa- tion, and infrastructure. While this report identifies many needs and opportunities, the path forward is not yet fully defined and will require additional analysis. Recommendation: Federal agencies, in cooperation with the chemical sciences and information technology communities, will need to carry out a comprehensive assessment of the chemical sciences-information tech- nology infrastructure portfolio. The information provided by such an assessment will provide federal fund- ing agencies with a sound basis for planning their future investments in both disciplinary and cross-disciplinary research. The following are among the actions that need to be taken: Identify criteria and appropriate indicators for setting priorities for infra- structure investments that promote healthy science and facilitate the rapid move- ment of concepts into well-engineered technological applications. Address the issue of standardization and accuracy of codes and databases, including the possibility of a specific structure or mechanism (e.g., within a fed- eral laboratory) to provide responsibility for standards evaluation. Cooperative Stewardship: Managing the Nation's Multidisciplinary User Facilities for Research with Synchrotron Radiation, Neutrons, and High Magnetic Fields, National Research Council, Na- tional Academy Press, Washington, D.C., 1999.

OCR for page 49
INFRASTRUCTURE: CAPABILITIES AND GOALS 59 Develop a strategy for involving the user community in testing and adopt- ing new tools, integration, and standards development. Federal investment in IT architecture, standards, and applications are expected to scale with growth of the user base, but the user market is deeply segregated, and there may not yet be a defined user base for any specific investment. Determine how to optimize incentives within peer-reviewed grant pro- grams for creation of high quality cross-disciplinary software. During the next 10 years, chemical science and engineering will be participating in a broad trend in the United States and across the world: we are moving toward a distributed cyberinfrastructure. The goal will be to provide a collaborative framework for individual in- vestigators who want to work with each other or with industry on larger-scale projects that would be impossible for individual investi- gators working alone. Larry Smarr (Appendix DJ Recommendation. In order to take full advantage of the emerging Grid- based IT infrastructure, federal agencies in cooperation with the chemical sciences and information technology communities should consider establishing several collaborative data-modeling environments. By integrating software, interpretation, data, visualization, networking, and commodity computing, and using web services to ensure universal access, these collaborative environments could impact tremendously the value of IT for the chemical community. They are ideal structures for distributed learn- ing, research, insight, and development on major issues confronting both the chemical community and the larger society. Collaborative Modeling-Data Environments should be funded on a multiyear basis; should be organized to provide integrated, efficient, standardized, state-of- the-art software packages, commodity computing and interpretative schemes; and should provide open-source approaches (where appropriate), while maintaining security and privacy assurance. This report should be seen in the context of the larger initiative, Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering,3 as well as in the six accompanying reports on societal needs (of which this report is 3Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering, National Research Council, The National Academies Press, Washington, D.C., 2003.

OCR for page 49
60 INFORMATION AND COMMUNICATION one).4 5 6 7 ~ This component on Information and Communications, examines per- haps the most dynamically growing capability in the chemical sciences. The find- ings reported in the Executive Summary and in greater depth in the body of the text constitute what the committee believes to be viable and important guidance to help the chemical sciences community to take full advantage of growing IT capabilities for the advancement of the chemical sciences and technology and thereby for the betterment of our society and our world. 4Challenges for the Chemical Sciences in the 21st Century: National Security & Homeland De- fense, National Research Council, The National Academies Press, Washington, D.C., 2002. Challenges for the Chemical Sciences in the 21st Century: Materials Science and Technology, National Research Council, The National Academies Press, Washington, D.C., 2003. Challenges for the Chemical Sciences in the 21st Century: Energy and Transportation, National Research Council, The National Academies Press, Washington, D.C., 2003 (in preparation). Challenges for the Chemical Sciences in the 21st Century: The Environment, National Research Council, The National Academies Press, Washington, D.C., 2003. Challenges for the Chemical Sciences in the 21st Century: Health and Medicine, National Re- search Council, The National Academies Press, Washington, D.C., 2003 (in preparation).