Click for next page ( 72


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 71
D Workshop Presentations QUANTUM INFORMATION Charles H. Bennett IBM Research This paper discussed how some fundamental ideas from the dawn of the last century have changed our understanding of the nature of information. The foun- dations of information processing were really laid in the middle of the twentieth century, and only recently have we become aware that they were constructed a bit wrong that we should have gone back to the quantum theory of the early twen- tieth century for a more complete picture. The word calculation comes from the Latin word meaning a pebble; we no longer think of pebbles when we think of computations. Information can be sepa- rated from any particular physical object and treated as a mathematical entity. We can then reduce all information to bits, and we can deal with processing these bits to reveal implicit truths that are present in the information. This notion is best thought of separately from any particular physical embodiment. In microscopic systems, it is not always possible (and generally it is not possible) to observe a system without disturbing it. The phenomenon of entangle- ment also occurs, in which separated bodies could be correlated in a way that cannot be explained by traditional classical communication. The notion of quan- tum information can be abstracted, in much the same way as the notions of clas- sical information. There are actually more things that can be done with informa- tion, if it is regarded in this quantum fashion. The analogy between quantum and classical information is actually straight- 71

OCR for page 71
72 APPENDIX D forward classical information is like information in a book or on a stone tablet, but quantum information is like information in a dream: we try to recall the dream and describe it; each description resembles less the original dream than the previ- ous description. The first, and so far the most practical, application of quantum information theory is quantum cryptography. Here one encodes messages and passes them on. If one thinks of photon polarization, one can distinguish vertical and horizontal polarization through calcite crystals, but diagonal photons are in principle not reliably distinguishable. They should be thought of as a superposition of vertical and horizontal polarization, and they actually propagate containing aspects of both of these parent states. This is an entangled structure, and the entangled struc- ture contains more information than either of the pure states of which it is com- posed. The next step beyond quantum cryptography, the one that made quantum information theory a byword, is the discovery of fast quantum algorithms for solving certain problems. For quantum computing, unlike simple cryptography, it is necessary to consider not only the preparation and measurement of quantum states, but also the interaction of quantum data along a stream. This is technically a more difficult issue, but it gives rise to quite exciting basic science involving quantum computing. The entanglement of different state structures leads to so-called Einstein- Podolsky-Rosen states, which are largely responsible for the unusual properties of quantum information. The most remarkable advance in the field, the one that made the field fa- mous, is the fast factor algorithm discovered by Shor at Bell Laboratories. It demonstrates that exponential speedup can be obtained using a quantum com- puter to factor large numbers into their prime components. Effectively, this quan- tum factorization algorithm works because it is no more difficult, using a quan- tum computer, to factor a large number into its prime factors than it is to multiply the prime factors to produce the large number. It is this condition that renders a quantum computer exponentially better than a classical computer in problems of this type. The above considerations deal with information in a disembodied form. If one actually wants to make a quantum computer, there are all sorts of fabrication, interaction, decoherence, and interference considerations. This is a very rich area of experimental science, and many different avenues have been attempted. Nuclear magnetic resonance, ion traps, molecular vibrational states, and solid- state implementations have all been used in attempts to produce actual quantum computers. Although an effective experimental example of a quantum computer is still incomplete, many major theoretical advances have suggested that some of the obvious difficulties can in fact be overcome. Several discoveries indicate that the effects of decoherence can be prevented, in principle, by including quantum error

OCR for page 71
ANNE M. CHAKA 73 correcting codes, entanglement distillation, and quantum fault-tolerant circuits. Good quantum computing hardware does not yet exist. The existence of these methodologies means that the hardware does not have to be perfectly efficient or perfectly reliable, because these programming techniques can make arbitrarily good quantum computers possible, even with physical equipment that suffers from decoherence issues. Although most of the focus has been on quantum cryptography, quantum processing provides an important example of the fact that quantum computers not only can do certain tasks better than ordinary computers, but also can do different tasks that would not be imagined in the context of ordinary information process- ing. For example, entanglement can enhance the communication of classical mes- sages, by augmenting the capacity of a quantum channel for sending messages. To summarize, quantum information obeys laws that subtly extend those gov- erning classical information. The way in which these laws are extended is reminis- cent of the transition from real to complex numbers. Real numbers can be viewed as an interesting subset of complex numbers, and some questions that might be asked about real numbers can be most easily understood by utilization of the complex plane. Similarly, some computations involving real input or real output (by "real" I mean classical) are most rapidly developed using quantum intermediate space in quantum computers. When I'm feeling especially healthy, I say that quantum com- puters will probably be practical within my lifetime strange phenomena involv- ing quantum information are continually being discovered. HOW SCIENTIFIC COMPUTING KNOWLEDGE MANAGEMENT AND DATABASES CAN ENABLE ADVANCES AND NEW INSIGHTS IN CHEMICAL TECHNOLOGY Anne M. Chaka National Institute of Standards and Technology The focus of this paper is on how scientific computing and information tech- nology (IT) can enable technical decision making in the chemical industry. The paper contains a current assessment of scientific computing and IT, a vision of where we need to be, and a roadmap of how to get there. The information and perspectives presented here come from a wide variety of sources. A general per- spective from the chemical industry is found in the Council on Chemical Research's Vision 2020 Technology Partnership, several workshops sponsored iChemical Industry of the Future: Technology Roadmap for Computational Chemistry, Thompson, T. B., Ed., Council for Chemical Research, Washington, DC, 1999; http://www.ccrhq.org/vision/index/

OCR for page 71
74 APPENDIX D by NSF, NIST,2 and DOE, and the WTEC report on industrial applications of molecular and materials modeling (which contains detailed reports on 91 institu- tions, including over 75 chemical companies, plus additional data from 55 U.S. chemical companies and 256 world-wide institutions (industry, academia, and government).3 My own industrial perspective comes from my position as co- leader for the Lubrizol R&D IT Vision team for two years, and ten years as the head of computational chemistry and physics prior to coming to NIST. It should be noted that although this paper focuses primarily on the chemical industry, many of the same issues apply to the biotechnology and materials industry. There are many factors driving the need for scientific computing and knowl- edge management in the chemical industry. Global competition is forcing U.S. industry to reduce R&D costs and the time to develop new products in the chemi- cal and materials sectors. Discovery and process optimization are currently lim- ited by a lack of property data and insight into mechanisms that determine perfor- mance. Thirty years ago there was a shortage of chemicals, and customers would pay premium prices for any chemical that worked at all. Trial and error was used with success to develop new chemistry. Today, however, the trend has shifted due to increased competition from an abundance of chemicals that work on the market, customer consolidation, and global competition that is driving commod- ity pricing even for high-performance and fine chemicals. Trial and error have become too costly, and the probability of success is too low. Hence it is becoming widely recognized that companies need to develop and fine-tune chemicals and formulations by design in order to remain competitive, and to screen chemicals prior to a long and costly synthesis and testing process. In addition, the chemicals produced today must be manufactured in a way that minimizes pollution and energy costs. Higher throughput is being achieved by shifting from batch pro- cessing to continuous feed stream, but this practice necessitates a greater under- standing of the reaction kinetics, and hence the mechanism, to optimize feed stream rates. Simulation models are needed that have sufficient accuracy to be able to predict what upstream change in process variables are required to main- tain the downstream products within specifications, as it may take several hours for upstream changes to affect downstream quality. Lastly, corporate downsizing is driving the need to capture technical knowledge in a form that can be queried and augmented in the future. 2NIST Workshop on Predicting Thermophysical Properties of Industrial Fluids by Molecular Simu- lations (June, 2001), Gaithersburg, MD; 1st International Conference on Foundations of Molecular Modeling and Simulation 2000: Applications for Industry (July, 2000), Keystone, CO; Workshop on Polarizability in Force Fields for Biological Simulations (December 13-15, 2001), Snowbird, UT. Applying Molecular and Materials Modeling, Westmoreland, P. R.; Kollman, P. A.; Chaka, A. M.; Cummings, P. T.; Morokuma, K.; Neurock, M.; Stechel, E.B .; Vashishta, P., Eds. Kluwer Aca- demic Publishers, Dordrecht, 2002; http://www.wtec.org/.

OCR for page 71
ANNE M. CHAKA 75 Data and property information are most likely to be available on commodity materials, but industrial competition requires fast and flexible means to obtain data on novel materials, mixtures, and formulations under a wide range of condi- tions. For the vast majority of applications, particularly those involving mixtures and complex systems (such as drug-protein interactions or polymer nanocomposites), evaluated property data simply do not exist and are difficult, time consuming, or expensive to obtain. For example, measuring the density of a pure liquid to 0.01% accuracy requires a dual sinker apparatus costing $500,000, and approximately $10,000 per sample. Commercial laboratory rates for measur- ing vapor-liquid equilibria for two state points of a binary mixture are on the order of $30,000 to 40,000. Hence industry is looking for a way to supply mas- sive amounts of data with reliable uncertainty limits on demand. Predictive mod- eling and simulation have the potential to help meet this demand. Scientific computing and information technology, however, have the poten- tial to offer so much more than simply calculating properties or storing data. They are essential to the organization and transformation of data into wisdom that en- ables better technical decision making. The information management pyramid can be assigned four levels defined as follows: 1. Data: a disorganized, isolated set of facts 2. Information: organized data that leads to insights regarding relation- ships knowing what works 3. Knowledge: knowing why something works 4. Wisdom: having sufficient understanding of factors governing perfor- mance to reliably predict what will happen knowing what will work To illustrate how scientific computing and knowledge management convert data and information into knowledge and wisdom, a real example is taken from lubricant chemistry. Polysulfides, R-Sn-R, are added to lubricants to prevent wear of ferrous metal components under high pressure. The length of the polysulfide chain, n, is typically between 2 and 6. A significant performance problem is that some polysulfide formulations also cause corrosion of copper-containing compo- nents such as bronze or brass. To address this problem, a researcher assembles data from a wide variety of sources such as analytical results regarding composi- tion, corrosion and antiwear performance tests, and field testing. IT makes it easy for the right facts to be gathered, visualized, and interpreted. After analysis of the data, the researcher comes to the realization that long-chain polysulfides (n = 4 or greater) corrode copper, but shorter chains (n = 2 to 3) do not. This is knowing what happens. Scientific computing and modeling can then be used to determine why something happens. In this case, quantum mechanics enabled us to under- stand that the difference in reactivity of these sulfur chains could be explained by significant stabilization of the thiyl radical delocalized over two adjacent sulfur atoms after homolytic cleavage of the S-S bond: R-SS-SS-R ~ 2R-SS.. The

OCR for page 71
76 APPENDIX D monosulfur thiyl radical R-S. was significantly higher in energy and therefore is much less likely to form. Hence copper corrosion is consistent with the formation of stable thiyl radicals. This insight led to a generalization that almost any sulfur radical with a low energy of formation will likely corrode copper, and we were able to reliably predict copper corrosion performance from the chemical structure of a sulfur-containing species prior to testing. This understanding also led to im- provements in the manufacturing process and other applications of sulfur chemis- try, and is an example of what is meant by wisdom (i.e., reliably predicting what will happen in novel applications due to a fundamental understanding of the un- derlying chemistry and physics). What is the current status of scientific computing and knowledge manage- ment with respect to enabling better technical decisions? For the near term, data- bases, knowledge management, and scientific computing are currently most ef- fective when they enable human insight. We are a long way from hitting a carriage return and obtaining answers to tough problems automatically, if ever. Wetware (i.e., human insight) is currently the best link between the levels of data, informa- tion, knowledge, and wisdom. There is no substitute for critical, scientific think- ing. We can, however, currently expect an idea to be tested via experiment or calculation. First principles calculations, if feasible on the system, improve the robustness of the predictions and can provide a link between legacy data and novel chemistry applications. Computational and IT methods can be used to gen- erate a set of possibilities combinatorially, analyze the results for trends, and visualize the data in a manner that enables scientific insight. Developing these systems is resource intensive and very application specific. Companies will in- vest in their development for only the highest priority applications, and the knowl- edge gained will be proprietary. Access to data is critical for academics in the QSPR-QSAR method development community, but is problematic due to intel- lectual property issues in the commercial sector.4 Hence there is a need to ad- vance the science and the IT systems in the public arena to develop the funda- mental foundation and building blocks upon which public and proprietary institutions can develop their own knowledge management and predictive model- ing systems. What is the current status of chemical and physical property data? Published evaluated chemical and physical property data double every 10 years, yet this is woefully inadequate to keep up with demand. Obtaining these data requires me- ticulous experimental measurements and/or thorough evaluations of related data from multiple sources. In addition, data acquisition processes are time- and resource-consuming and therefore must be initiated well in advance of an antici- pated need within an industrial or scientific application. Unfortunately a signifi- cant part of the existing data infrastructure is not directly used in any meaningful 4Comment by Professor Curt Breneman, Rensselaer Polytechnic Institute.

OCR for page 71
ANNE M. CHAKA 77 application because data requirements often shift between the initiation and completion of a data project. Analysis and fitting, such as for equation-of-state models, must be reinitiated when significant new data become available. One vision that has been developed in consultation with the chemical and materials industries can be described as a "Universal Data and Simulation En- gine." This engine is a framework of computational tools, evaluated experimental data, active databases, and knowledge-based software guides for generating chemical and physical property data on demand with quantitative measures of uncertainty. This engine provides validated, predictive simulation methods for complex systems with seamless multiscale and multidisciplinary integration to predict properties and model physical phenomena and processes. The results are then visualized in a form useful for scientific interpretation, sometimes by a non- expert. Examples of high-priority challenges cited by industry in the WTEC re- port to be ultimately addressed by the Universal Data and Simulation Engine are discussed below.5 How do we achieve this vision of a Universal Data and Simulation Engine? Toward this end, NIST has been exploring the concepts of dynamic data evaluation and virtual measurements of chemical and physical properties and predictive simula- tions of physical phenomena and processes. In dynamic data evaluation, all available experimental data within a technical area are collected routinely and continuously, and evaluations are conducted dynamically using an automated system when in- formation is required. The value of data is directly related to the uncertainty, so "rec- ommended" data must include a robust uncertainty estimate. Metadata are also col- lected (i.e., auxiliary information required to interpret the data such as experimental method). Achieving this requires interoperability and data exchange standards. Ide- ally the dynamic data evaluation is supplemented by calculated data based on vali- dated predictive methods (virtual measurements), and coupled with a carefully con- sidered experimental program to generate benchmark data. Both virtual measurements and the simulation engine have the potential to meet a growing fraction of this need by supplementing experiment and providing data in a timely manner at lower cost. Here we define "virtual measurements" specifically as predictive modeling tools that yield property data with quantified uncertainties analogous to observable quantities measured by experiment (e.g., rate constants, solubility, density, and vapor-liquid equilibria). By "simulation" we mean validated modeling of processes or phenomena that provides insight 5These include liquid-liquid interfaces (micelles and emulsions), liquid-solid interfaces (corrosion, bonding, surface wetting, transfer of electrons and atoms from one phase to another), chemical and physical vapor deposition (semiconductor industry, coatings), and influence of chemistry on the thermomechanical properties of materials, particularly defect dislocation in metal alloys; complex reactions in multiple phases over multiple time scales. Solution properties of complex solvents and mixtures (suspending asphaltenes or soot in oil, polyelectrolytes, free energy of salvation rheology), composites (nonlinear mechanics, fracture mechanics), metal alloys, and ceramics.

OCR for page 71
78 APPENDIX D into mechanisms of action and performance with atomic resolution that is not directly accessible by experiment but is essential to guide technical decision mak- ing in product design and problem solving. This is particularly crucial for con- densed-phase processes where laboratory measurements are often the average of myriad atomistic processes and local conditions that cannot be individually re- solved and analyzed by experimental techniques. It is analogous to gas-phase kinetics in the 1920s prior to modern spectroscopy when total pressure was the only measurement possible. The foundation for virtual measurements and simu- lations is experimental data and mathematical models that capture the underlying physics at the required accuracy of a given application. Validation of theoretical methods is vitally important. The Council for Chemical Research's Vision 2020 states that the desired target characteristics for a virtual measurement system for chemical and physical properties are as follows: problem setup requires less than two hours, completion time is less than two days, cost including labor is less than $1,000 per simulation, and it is usable by a nonspecialist (i.e., someone who cannot make a full-time career out of molecular simulation). Unfortunately, we are a long way from meet- ing this target, particularly in the area of molecular simulations. Quantum chem- istry methods have achieved the greatest level of robustness and coupled with advances in computational speed have enabled widespread success in areas such as predicting gas-phase, small-molecule thermochemistry and providing insight into reaction mechanisms. Current challenges for quantum chemistry are accurate predictions of rate constants and reaction barriers, condensed-phase thermochem- istry and kinetics, van der Waals forces, searching a complex reaction space, transition metal and inorganic systems, and performance of alloys and materials dependent upon the chemical composition. A measure of the current value of quantum mechanics to the scientific com- munity is found in the usage of the NIST Computational Chemistry Comparison and Benchmark Database (CCCBDB), (http://srdata.nist.gov/cccbdb). The CCCBDB was established in 1997 as a result of an (American Chemical Society) ACS workshop to answer the question, How good is that ah initio calculation? The purpose is to expand the applicability of computational thermochemistry by providing benchmark data for evaluating theoretical methods and assigning un- certainties to computational predictions. The database contains over 65,000 cal- culations on 615 chemical species for which there are evaluated thermochemical data. In addition to thermochemistry, the database also contains results on struc- ture, dipole moments, polarizability, transition states, barriers to internal rotation, atomic charges, etc. Tutorials are provided to aid the user in interpreting data and evaluating methodologies. Since the CCCBDB's inception, usage has doubled every year up to the current sustained average of 18,000 web pages served per month, with a peak of over 50,000 pages per month. Last year over 10,000 sepa- rate sites accessed the CCCBDB. There are over 400 requests per month for new chemical species not contained in the database.

OCR for page 71
ANNE M. CHAKA 79 The CCCBDB is currently the only computational chemistry or physics data- base of its kind. This is due to the maturity of quantum mechanics to reliably predict gas-phase thermochemistry for small (20 nonhydrogen atoms or less), primarily organic, molecules, plus the availability of standard-reference-quality experimental data. For gas-phase kinetics, however, only in the past two years have high-quality (<2% precision) rate-constant data become available for H. and OH transfer reactions to begin quantifying uncertainty for the quantum me- chanical calculation of reaction barriers and tunneling.6 There is a critical need for comparable quality rate data and theoretical validation for a broader class of gas-phase reactions, as well as solution phase for chemical processing and life science, and surface chemistry. One of the highest priority challenges for scientific computing for the chemi- cal industry is the reliable prediction of fluid properties such as density, vapor- liquid equilibria, critical points, viscosity, and solubility for process design. Empirical models used in industry have been very useful for interpolating experi- mental data within very narrow ranges of conditions, but they cannot be extended to new systems or to conditions for which they were not developed. Models based exclusively on first principles are flexible and extensible, but can be applied only to very small systems and must be "coarse-grained" (approximated by averaging over larger regions) for the time and length scales required in industrial applica- tions. Making the connection between quantum calculations of binary interac- tions or small clusters and properties of bulk systems (particularly systems that exhibit high-entropy or long-range correlated behavior) requires significant break- throughs and expertise from multiple disciplines. The outcome of the First Indus- trial Fluid Properties Simulation Challenge7 (sponsored by AIChE's Computa- tional Molecular Science and Engineering Forum and administered by NIST) underscored these difficulties and illustrated how fragmented current approaches are. In industry, there have been several successes in applying molecular simula- tions, particularly in understanding polymer properties, and certain direct phase equilibrium calculations. Predicting fluid properties via molecular simulation, however, remains an art form rather than a tool. For example, there are currently over a dozen popular models for water, but models that are parameterized to give good structure for the liquid phase give poor results for ice. Others, parameter- 6Louis, F.; Gonzalez, C.; Huie, R. E.; Kurylo, M. J. J. Phys. Chem. A 2001,105, 1599-1604. 7The goals of this challenge were to: (a) obtain an in-depth and objective assessment of our current abilities and inabilities to predict thermophysical properties of industrially challenging fluids using computer simulations, and (b) drive development of molecular simulation methodology toward a closer alignment with the needs of the chemical industry. The Challenge was administered by NIST and sponsored by the Computational Molecular Science and Engineering Forum (AIChE). Industry awarded cash prizes to the champions to each of the three problems (vapor-liquid equilibrium, den- sity, and viscosity). Results were announced at the AIChE annual meeting in Indianapolis, IN, No- vember 3, 2002.

OCR for page 71
80 APPENDIX D ized for salvation and biological applications, fail when applied to vapor-liquid equilibrium properties for process engineering. The lack of "transferability" of water models indicates that the underlying physics of the intermolecular interac- tions is not adequately incorporated. The tools and systematic protocols to cus- tomize and validate potentials for given properties with specified accuracy and uncertainty do not currently exist and need to be developed. In conclusion, we are still at the early stages of taking advantage of the full potential offered by scientific computing and information technology to benefit both academic science and industry. A significant investment is required to ad- vance the science and associated computational algorithms and technology. The impact and value of improving chemical-based insight and decision making are high, however, because chemistry is at the foundation of a broad spectrum of technology and biological processes such as . how a drug binds to an enzyme, manufacture of semiconductors, chemical reactions occurring inside a plastic that makes it burn faster than others, and . how defects migrate under stress in a steel I-beam. A virtual measurement system can serve as a framework for coordinating and catalyzing academic and government laboratory science in a form useful for solving technical problems and obtaining properties. There are many barriers to obtaining the required datasets that must be overcome, however. Corporate data are largely proprietary, and in academia, generating data is "perceived as dull so it doesn't get funded."8 According to Dr. Carol Handwerker (chief, Metallurgy Division, Materials Science and Engineering Laboratory, NIST), even at NIST just gathering datasets in general is not well supported at the moment, because it is difficult for NIST to track the impact that the work will have on the people who are using the datasets. One possible way to overcome this barrier may be to de- velop a series of standard test problems in important application areas where the value can be clearly seen. The experimental datasets would be collected and theo- retical and scientific computing algorithms would be developed, integrated, and focused in a sustained manner to move all the way through the test problems. The data collection and scientific theory and algorithm development then clearly be- come means to an end. recommend by Professor John Tully, Yale University.

OCR for page 71
JUAN J. DE PABLO ON THE STRUCTURE OF MOLECULAR MODELING: MONTE CARLO METHODS AND MULTISCALE MODELING Juan J. de Pablo University of Wisconsin, Madison 81 The theoretical and computational modeling of fluids and materials can be broadly classified into three categones, namely automatic or molecular, meso- scopic, and continuum or macroscopic.) At the automatic or molecular level, de- tailed models of the system are employed in molecular simulations to predict the structural, thermodynamic, and dynamic behavior of a system. The range of ap- plication of these methods is on the order of angstroms to nanometers. Ex- amples of this type of work are the prediction of reaction pathways using elec- tronic structure methods, the study of protein structure using molecular dynamics or Monte Carlo techniques, or the study of phase transitions in liquids and solids from knowledge of intermolecular forces.2 At the mesoscopic level, coarse- gra~ned models and mean field treatments are used to predict structure and prop- erties at length scales ranging from tens of nanometers to microns. Examples of this type of research are the calculation of morphology in self-assembling sys- tems (e.g., block copolymers and surfactants) and the study of macromolecular configuration (e.g., DNA) in m~crofluidic devices.3 4 5 At the continuum or mac- roscopic level, one is interested in predicting the behavior of fluids and materials on laboratory scales Omicrons to centimeters), and this is usually achieved by numerical solution of the relevant conservation equations (e.g., Navier-Stokes, in the case of fluids).6 Over the last decade considerable progress has been achieved in the three categories described above. It is now possible to think about "multiscale model- ing" approaches, in which distinct methods appropriate for different length scales are combined or applied simultaneously to achieve a comprehensive description of a system. This progress has been partly due to the ever-increasing power of computers but, to a large extent, it has been the result of important theoretical and algorithmic developments in the area of computational materials andfluids modeling. Much of the interest in multiscale modeling methods is based on the premise that, one day, the behavior of entirely new materials or complex fluids will be iDe Pablo J. J.; Escobedo, F. A. AIChE Journal 2002, 48, 2716-2721. 2Greeley J.; Norskov, J. K.; Mavrikakis, M. Ann. Rev. Phys. Chem. 2002, 53, 319-348. 3Fredrickson G. H., Ganesan, V.; Drolet, F. Macromol. 2002, 35, 16-39. 4Hur J. S.; Shaqfeh, E. S. G.; Larson, R. A. J. Rheol. 2000, 44, 713-742. 5Jendrejack R. M.; de Pablo, J. J.; Graham, M. D. J. Chem. Phys. 2002, 116, 7752-7759. 6Bird, R. B.; Stewart, W. E.; Lightfoot, E. N. Transport Phenomena: 2nd Ed., John Wiley: New York, NY; 2002.

OCR for page 71
166 1 .h _ ~ o ~ O - 5 10 APPENDIX D 15 , ~ it: 0 200 400 Temperature (=C) /\ Models Model FIGURE 4 First-principles calculations can provide input to models downstream. The aluminum alloy material has roughly 11 components (as a recycled ma- terial, the alloy actually has an unknown number of components that depend on the recycling stream), and some of the concentrations are quite small. Some com- ponents have more impact than others do. Impurities can have enormous effects on the microstructure and thereby affect the properties. Precipitates form during heat treatment, and manufacturers have the opportunity to optimize the aluminum heat treatment to achieve an optimized structure. First principles calculations and multiscale calculations are able to elucidate opportunities, including overturning 100 years of metallurgical conventional wisdom (Figure 4~.8 Several key challenges must be overcome in this area. In the chemical sci- ences, methodologies are needed to acquire quantitative kinetic information on real industrial materials without resorting to experiment. It is also necessary to determine kinetic pre-factors and barriers that have enough accuracy to be useful. Another key challenge is the need for seamless multiscale modeling including uncertainty quantification. Properties such as ductility and tensile strength are still very difficult to calculate. Finally, working effectively across the disciplines still remains a considerable barrier. ~Wolverton, C.; Ozolins, V. Phys. Rev. Lett. 2001, 86, 5518; Wolverton, C.; Yan, X.-Y.; Vijayaraghavan, R.; Ozolins, V. Acta Mater. 2002, 50, 2187.)

OCR for page 71
ELLEN STECHEL Catalysis of Exhaust Gas Aftertreatment 167 Exhaust aftertreatment catalysts provide a second, less-integrated example of the use of chemical sciences modeling and simulation to approach an applica- tion-related problem. Despite large investments in catalysis, both academia and funding agencies seem to have little interest in catalysis for stoichiometric ex- haust aftertreatment. Perhaps the perception of a majority of the scientific com- munity is that stoichiometric exhaust aftertreatment is a solved problem. Although there is a large empirical knowledge base and a cursory understanding of the principles, from my vantage point the depth of understanding is far from adequate especially considering the stringency of Tier 2 tailpipe emissions regulations. This is another challenging and intellectually stimulating research area with criti- cal issues that span the range from the most fundamental to the most applied. After the discovery in the 1960s that tailpipe emissions from vehicles were adversely affecting air quality, there were some significant technological break- throughs, and simply moving along a normal evolutionary path, we now have obtained three-way catalysts that reduce NOX and oxidize hydrocarbons and CO at high levels of efficiency with a single, integrated, supported-catalyst system. Unfortunately, the three-way catalyst works in a rather narrow range of air-fuel ratio, approximately the stoichiometric ratio. This precludes some of the opportu- nities to improve fuel economy for example, a leaner air-fuel ratio can yield better fuel economy but, with current technology, only at the expense of greater NOX emissions. Also, most exhaust pollution comes out of the tailpipe in the first 50 seconds of vehicle operation, because the catalyst has not yet reached a tem- perature range in which it is fully active. The three-way catalyst is composed of precious metals on supported alumina with ceria-based oxygen storage, coated on a high-surface-area ceramic or a me- tallic substrate with open channels for the exhaust to pass through and over the catalyst. As regulations become increasingly stringent for tailpipe emissions, the industry is transitioning to higher cell densities (smaller channels) and thinner walls between channels. This simultaneously increases the surface area, decreases the thermal mass, and reduces the hydraulic diameter of the channels; all three effects are key enablers for higher-efficiency catalysts. The high-surface-area nano-structured washcoat is another key enabler, but it is necessary to maintain the high surface area of the coating and highly dispersed catalytically active pre- cious metals for the life of the vehicle despite high-temperature operation and exposure to exhaust gas and other chemical contaminants. In other words, the material must have adequate thermal durability and resist sintering and chemical poisoning from exhaust gas components. Neither thermal degradation nor chemi- cal degradation is particularly well understood beyond some general principles; i.e., modeling is difficult without a lot of empirical input. The industry also must design for end of life, which is a very large design space. Simulation currently uses empirically derived, aged-catalyst-state input.

OCR for page 71
168 APPENDIX D What the industry really needs is a predictive capability. The industry also has to minimize the cost and the amount of the active catalytic platinum, palladium, and rhodium used, since precious metals are a rare and limited commodity. Again a key chemical science challenge is kinetics. How does one obtain quantitative kinetic information for real industrial, heterogeneous catalysts with- out resorting to time-consuming experiments? Simulations currently use empiri- cally derived simplified kinetics. The science of accelerated aging has generally been ignored. The automotive industry must be able to predict aging of catalysts and sensors without running many vehicles to 150,000 miles. The industry does currently take several vehicles and accumulate miles, but driving vehicles to that many miles is a particularly slow way to get answers and does not provide good statistics. The industry does utilize accelerated aging, but it is done somewhat empirically without the benefit of a solid foundation of basic science. Homogeneous Charge Compression Ignition The third and final example arises from the Ford Motor Company-MIT Alli- ance, a partnership that funds mutually beneficial collaborative research. The princi- pal investigators, Bill Green and Bill Kaiser, are from MIT and Ford, respectively. The project spans a range from the most fundamental chemistry and chemical engi- neering to the most applied. In contrast to the first two examples, this project focuses on a technology in development, as opposed to a technology already in practice. Homogeneous charged compression ignition is similar in concept to a diesel engine. It is high efficiency and runs lean. It is compression ignited, as opposed to spark ignited, but it has no soot or NOX because it runs much cooler than diesel. It is similar to a gasoline engine in that it uses pre-mixed, volatile fuels like gaso- line, and it has similar hydrocarbon emissions. But an HCCI engine has a much higher efficiency and much lower NOX emissions than a gasoline engine, which could eliminate the need for the costly three-way precious metal catalyst. However, HCCI is difficult to control. There is no simple timing mechanism, which can control ignition, as exists for a spark or diesel fuel injection engine. HCCI operates by chemistry and consequently is much more sensitive to the fuel chemistry than either spark-ignition or diesel engines. The fuel chemistry in an internal combustion engine largely ignores the details of the fuel. HCCI looks very promising, but researchers do not yet know what the full operating range is or how to reliably control the combustion with computation. Yelvington and Green have already demonstrated that HCCI can work well over a broad range of conditions demonstrating the promise that computation and simulation can and must play a continuing and large role in resolving the HCCI challenges.9 9Yelvington, P. E.; Green, W. H.; SAE Technical Paper 2003, 2003-01-1092.

OCR for page 71
ELLEN STECHEL Final Words 169 The complexity of industrial design problems requires that one must be able to deal with the realistic, not overly idealized, system. The complexity demands a multidisciplinary approach and working in teams. Access to and understanding of the full range of methods are generally necessary if the researcher is going to have impact by solving relevant problems, and individual industrial researchers must be able to interact and communicate effectively beyond their disciplines. To achieve integration from fundamental science to real-world applications is truly a challenge of coordination, integration, and communication across disciplines, approaches, organizations, and research and development sectors. In general, approximate answers or solutions with a realistic quantification of uncertainty if arrived at quickly have greater impact than highly accurate answers or solutions arrived at too late to impact critical decisions. Often, there is no need for the increased level of accuracy. To quote Einstein, "Things should be made as simple as possible, but not any simpler." Sometimes researchers in in- dustry, just out of necessity, oversimplify. That is when the automotive develop- ment engineer might lose sleep, because it could mean that vehicles might have unexpected reliability issues in the field, if the oversimplification resulted in a . . wrong c eclslon. Simulations with varying degrees of empiricism and predictive capability should be aligned closely with extensive experimental capabilities. It is also im- portant to bring simulation in at the beginning of a new project. Too often, experi- mentalists turn to a computational scientist only after repeated experimental fail- ures. Many of these failures would have been avoided had the consultation occurred at an early stage. The computational expert could help generate hypoth- eses even without doing calculations or by doing some simple calculations, and the perspective of the computational researcher can frequently eliminate dead ends. In addition, framing questions correctly so that the experiments yield unam- biguous results, or reasonably unambiguous results, is crucial. Obtaining reason- ably definitive answers in a timely manner is equally crucial, but too often, there does not seem to be a time scale driving needed urgency. Data and knowledge management offer additional overlooked opportunities. Hypotheses that have been proven wrong often continue to influence decisions. Decision makers too often operate on the basis of disproved or speculative con- jectures rather than on definitive data. The science and technology enterprise needs a way to manage data such that it is relatively easy to know what the community does and does not know as well as what the assumptions are that underlie current knowledge. Finally, it is important to have realistic expectations for success. This is a difficult challenge because what one measures and rewards is what one gets. There are many ways to impact applications and technology with any level of sophistication in a simulation. Some of the important ways lend themselves only

OCR for page 71
170 APPENDIX D to intangible measures, but oftentimes these may be the most important. Again quoting Einstein, "Not everything that counts can be counted, and not everything that can be counted counts." DRUG DISCOVERY: A GAME OF TWENTY QUESTIONS Dennis J. Underwood Infinity Pharmaceuticals, Inc. Introduction There is a revolution taking place in the pharmaceutical industry. An era in which nearly continuous growth and profitability is taken for granted is coming to an end. Many of the major pharmaceutical companies have been in the news lately with a plethora of concerns ranging from the future of the biotechnology sector as a whole to concerns over the availability of drugs to economically dis- enfranchised groups in the developed and the developing world. The issues for the pharmaceutical sector are enormous and will likely result in changes in the health-care system, including the drug discovery and development enterprises. Central challenges include the impact that drugs coming off patents have had on the recent financial security of the pharmaceutical sector and the need for im- proved protection of intellectual property rights. Historically, generic competi- tion has slowly eroded a company's market share. There is a constant battle be- tween the pace of discovering new drugs and having old drugs going off patent, providing generic competition opportunities to invade their market share. Recent patent expirations have displayed a much sharper decline in market share, mak- ing new drugs even more critical. The 1990s were a decade in which the pharmaceutical giants believed they could sustain growth indefinitely by dramatically increasing the rate of bringing new medicines to market simply by increasing R&D spending and continuing to utilize the same research philosophies that worked in the past. It is clear from the rapid rise in R&D expenditure and the resultant cost of discovering new drugs that the "old equation" is becoming less favorable. There is a clear need to be- come more efficient in the face of withering pipelines and larger and more com- plex clinical trials. For large pharmaceutical companies to survive, they must maintain an income stream capable of supporting their current infrastructure as well as funding R&D for the future. The cost of drug development and the low probability of technical success call for improved efficiency of drug discovery and development and further investment in innovative technologies and processes that improve the chances of bringing a compound to market as a drug. Already there has been quite a change in the way in which drugs are discovered. Large pharmaceutical companies are diversifying their drug discovery and development

OCR for page 71
DENNIS J. UNDERWOOD 171 processes: They are relying more on the inventiveness of smaller biotechnology companies and licensing technology, compounds, and biologicals at a faster, more competitive rate. To meet critical time lines they are outsourcing components of research and development to contract research organizations, enabling greater efficiencies by providing added expertise or resources and decreasing develop- ment time lines. The trend toward mergers and acquisitions, consolidating pipe- lines, and attempting to achieve economies of scale is an attempt by large phar- maceutical companies to build competitive organizations. Although this may help short-term security, future ongoing success may not be ensured solely with this strategy. One of the most valuable assets of a pharmaceutical company is its experi- ence in drug discovery and development. Of particular importance are the data, information and knowledge generated in medicinal chemistry, pharmacology, and in vivo studies accumulated over years of research in many therapeutic areas. This knowledge is based on hundreds of person-years of research and develop- ment; yet most companies are unable to effectively capture, store, and search this experience. This intellectual property is enormously valuable. As with the other aspects of drug discovery and development, the methods and approaches used in data-, information- and knowledge-base generation and searching are undergoing evolutionary improvements and, at times, revolutionary changes. It is imperative for all data- and information-driven organizations to take full advantage of the information they are generating. We assert that those companies that are able to do this effectively will be able to gain and sustain an advantage in a highly com- plex, highly technical, and highly competitive domain. The aim of this overview is to highlight the important role informatics plays in pharmaceutical research, the approaches that are currently being pursued and their limitations, and the challenges that remain in reaping the benefit of advances. We are using the term "informatics" in a general way to describe the processes whereby information is generated from data and knowledge is derived as our understanding builds. Informatics also refers to the management and transformation of data, informa- tion, and assimilation of knowledge into the processes of discovery and develop- ment. There has been much time, money, and effort spent in attempting to reduce the time it takes to find and optimize new chemical entities. It has proven difficult to reduce the time it takes to develop a drug, but the application of new technolo- gies holds hope for dramatic improvements. The promise of informatics is to reduce development times by becoming more efficient in managing the large amounts of data generated during a long drug discovery program. Further, with managed access to all of the data, information, and experience, discoveries are more likely and the expectation is that the probability of technical success will increase. Why is the probability of technical success of drug discovery and develop- ment so low? What are the issues in moving compounds through the drug discov-

OCR for page 71
72 APPENDIX D cry and development pipeline? It is clear from existing data that the primary bottlenecks are pharmacokinetic problems and lack of efficacy in man. In addi- tion there are problems of toxicity in animal models and the discovery of adverse effects in man. Clearly there are significant opportunities to improve the prob- ability of technical success and, perhaps, to shorten the time line for develop- ment. The current strategies bring in absorption, distribution metabolism, excre- tion and toxicity (ADME-Tox) studies earlier in the process (late discovery) and use programs based on disease clusters rather than a single target.) Drug discovery and development is a difficult business because biology and the interaction with biology is complicated and, indeed, may be classifiable as a complex system. Complexity is due partly to an incomplete knowledge of the biological components of pathways; the manner in which the components inter- act and are compartmentalized; and the way they are modulated, controlled, and regulated in response to intrinsic and environmental factors over time. Mostly, biological interactions are important and not accounted for in screening and as- saying strategies. Often model systems are lacking in some way and do not prop- erly represent the target organism. The response of the cell to drugs is very de- pendent on initial conditions, which is to say that the response of the cell is very dependent on its state and the condition of many subcellular components. Typi- cally, the behavior of complex systems is different from those of the components, which is to say that the processes that occur simultaneously at different scales (protein, nucleic acid, macromolecular assembly, membrane, organelle, tissue, organism) are important and the intricate behavior of the entire system is depen- dent on the processes but in a nontrivial way.2 3 If this is true, application of the tools of reductionism may not provide us with an understanding of the responses of an organism to a drug. Are we approaching a "causality catastrophe" whenever we expect the results of in vitro assays to translate to clinical data? What is an appropriate way to deal with such complexity? The traditional approach has been to generate large amounts of replicate data, to use statistics to provide confidence, and to move cautiously, stepwise, toward higher complexity: from in vitro to cell-based to tissue-based to in viva in model animals and then to man. In a practical sense, the drug discovery and development odds have been improved by a number of simple strategies: Start with chemically diverse leads, optimize them in parallel in both discovery and later development, back them up with other compounds when they reach the clinic and follow on with new, struc- turally different compounds after release. Approach diseases by focusing, in par- allel, on clusters rather than on a single therapeutic target, unless the target has proven to be clinically effective. Generate more and better-quality data focusing on replicates, different conditions, different cell types in different states, and dif- iKennedy, T. Drug Discovery Today 1997, 2 (10), 436-444. 2Vicsek, T. Nature 2001, 411, 421. 3Glass, L. Nature 2001, 410, 277-84.

OCR for page 71
DENNIS J. UNDERWOOD 173 ferent model organisms. Develop good model systems, such as for ADME-Tox, and use them earlier in the discovery process to help guide the optimization pro- cess away from difficulties earlier (fail early). The increase in the amount and diversity of the data leads to a "data catastrophe" in which our ability to fully extract relationships and information from the data is diminished. The challenge is to provide informatics methods to manage and transform data and information and to assimilate knowledge into the processes of discovery and development; the issue is to be able to integrate the data and information from a variety of sources into consistent hypotheses rich in information. The more information, the better. For example, structure-based design has had many successes and has guided the optimization of leads through a detailed understanding of the target and the way in which compounds interact. This has become a commonly accepted approach, and many companies have large, active structural biology groups participating in drug discovery teams. One of the rea- sons that this is an accepted approach is that there is richness in data and informa- tion and there is a wealth of methodology available, both experimental and com- putational, that enables these approaches. The limitations in this area are concerned primarily with the challenges facing structural biology such as appro- priate expression systems, obtaining sufficient amounts of pure protein, and the ability to crystallize the protein. These limitations can be severe and can prevent a structural approach for many targets, especially membrane-bound protein com- plexes. There is also a question of relevancy: Does a static, highly ordered crys- talline form sufficiently represent dynamic biological events? There are approaches that can be used in the absence of detailed structural information. Further, it is often instructive to use these approaches in conjunction with structural approaches, with the aim of providing a coherent picture of the biological events using very different approaches. However, application of these methods is extremely challenging primarily due to the lack of detailed data and information on the system under study. Pharmacophore mapping is one such ap- proach. A complexity that is always present is that there are multiple ways in which compounds can interact with a target; there are always slight changes in orientation due to differences in chemical functionality and there are always dis- tinctly different binding modes that are possible. Current methods of pharmaco- phore mapping find it difficult to detect these alternates. Further, the approach often relies on high-throughput screening approaches that are inherently "noisy" making the establishment of consistent structure activity relationships difficult. In addition, the methodology developed in this area is often limited to only parts of the entire dataset. There has been much effort directed to deriving two-, three- , and higher-dimensional pharmacophore models, and the use of these models in lead discovery and development is well known.4 4Agrafiotis, D. K.; Lobanov, V. S.; Salemme, F. R. Nat. Rev. Drug Discov. 2002, 1, 337-46.

OCR for page 71
174 APPENDIX D The key issue in these methods is the manner in which compounds, their structure, features, character and conformational flexibility are represented. There are many ways in which this can be achieved, but in all cases the abstraction of the chemistry for computational convenience is an approximation. Each com- pound, in a practical sense, is a question that is being asked of a complex biologi- cal system. The answer to a single question provides very little information; how- ever, in an manner analogous to the game of "twenty question," the combined result from a well-chosen collection of compounds (questions) can provide an understanding of the biological system under study.5 The manner in which chem- istry is represented is essential to the success of such a process. It is akin to posing a well-informed question that, together with other well-formed questions, is pow- erful in answering or giving guidance to the issues arising from drug discovery efforts. Our approach is to use the principles of molecular recognition in simpli- fying representation of the chemistry: atoms are binned into simple types such as cation, anion, aromatic, hydrogen-bond acceptor and donor, and so forth. In addi- tion, the conformational flexibility of each molecule is represented. The result is a matrix in which the rows are compounds and the columns are bits of a very large string (tens of millions of bit long) that mark the presence of features. Each block of features can be the simple presence of a functional group such as phenyl, chlorophenyl, piperidyl, or aminoethyl, or it can be as complex as three- or four- pint pharmacophoric features that combine atom types and the distances between them. The richness of this representation along with a measure of the biological response of these compounds enables methods that use Shannon's maximum- entropy, information-based approach to discover ensembles of patterns of fea- tures that describe activity and inactivity. These methods are novel and have been shown to capture the essence of the effects of the SAR in a manner that can be used in the design of information-based libraries and the virtual screening of databases of known or realizable compounds.6 The power of these methods lies in their ability to discern relationships in data that are inherently "noisy." The data are treated as categorical rather that continuous: active (yes), inactive (no) and a variable category of unassigned data (maybe). These methods are deterministic and, as such, derive all relationships between all compounds at all levels of support. The relationships or patterns are scored based on their information content. Unlike methods such as recursive par- titioning, pattern discovery is not "greedy" and is complete. The discrimination ability of pattern discovery depends very much on the quality of the data gener- ated and on the type and condition of the compounds; if either is questionable, the 5Underwood, D .J. Biophys. J. 1995, 69, 2183-4. 6Beroza, P.; Bradley, E. K.; Eksterowicz, J. E.; Feinstein, R., Greene, J.; Grootenhuis, P. D.; Henne, R. M.; Mount, J.; Shirley, W. A.; Smellie, A.; Stanton, R. V.; Spellmeyer, D. C. J. Mol. Graph. Model. 2002,18, 335-42.

OCR for page 71
DENNIS J. UNDERWOOD 175 signal-to-noise ratio will be reduced and the quality of the results will be jeopar- dized. These approaches have been used in the study of G-protein Coupled Re- ceptors7 ~ and in ADME-Tox studies.9 i0 These methods have also been used in the identification of compounds that are able to present the right shape and character to a defined active site of a protein In cases where the protein structure is known and the potential bind- ing sites are recognized, the binding site can be translated into a bit-stnng that is in the same representational space as described above for the compounds. This translation is done using methods that predict the character of the space available for binding compounds. The ability to represent both the binding loiters) and com- pounds in the same way provides the mechanism to discnm~nate between com- pounds that are likely to bind to the protein. These approaches have been used for senne proteases, kineses and phosphatases.~3 The game of 20 questions is simple but, almost paradoxically, has the ability to give the inquisitor the ability to read the subject's mind. The way in which this occurs is well understood; in the absence of any information there are many pos- sibilities, a very large and complex but finite space. The solution relies on a tenet of a dialectic philosophy in which each question provides a thesis and an anti- thesis that is resolved by a simple yes or no answer. In so doing, the number of possibilities are dramatically reduced and, after 20 questions, the inquisitor is usually able to guess the solution. The solution to a problem in drug discovery and development is far more complex than a game of 20 questions and should not be tnvialized. Even so, the power of discnm~nation through categorization of answers and integration of answers from diverse experiments provides an ex- tremely powerful mechanism for optimizing to a satisfying outcome a potential drug. These approaches have been encoded into a family of algorithms known as pattern discovery (PD).~4 PD describes a family of novel methods in the category of data mining. One of the distinguishing features of PD is that it discovers rela- 7Wilson, D. M.; Termin, A. P.; Mao, L.; Ramirez-Weinhouse, M. M.; Molteni, V.; Grootenhuis, P. D.; Miller, K.; Keim, S.; Wise, G. J. Med. Chem. 2002, 45, 2123-6. ~Bradley, E. K.; Beroza, P.; Penzotti, J. E.; Grootenhuis, P. D.; Spellmeyer, D. C.; Miller, J. L. J. Med. Chem. 2000, 43, 2770-4. 9Penzotti, J. E.; Lamb, M. L.; Evensen, E.; Grootenhuis, P. D. J. Med. Chem. 2002, 45, 1737-40. Clark, D. E.; Grootenhuis, P. D. Curr. Opin. Drug Discov. Devel. 2002, 5, 382-90. iiSrinivasan, J.; Castellino, A.; Bradley, E. K.; Eksterowicz, J. E.; Grootenhuis, P. D.; Putta, S.; Stanton, R. V. J. Med. Chem. 2002, 45, 2494-500. i2Eksterowicz, J. E.; Evensen, E.; Lemmen, C.; Brady, G. P.; Lanctot, J. K.; Bradley, E. K.; Saiah, E.; Robinson, L. A.; Grootenhuis, P. D.; Blaney, J. M. J. Mol. Graph. Model. 2002, 20, 469-77. i3Rogers, W. T.; Underwood, D. J.; Argentar, D. R.; Bloch K. M.; Vaidyanathan, A. G. Proc. Natl. Acad. Sci. U.S.A., submitted. i4Argentar, D. R.; Bloch, K. M.; Holyst, H. A.; Moser, A. R.; Rogers, W. T.; Underwood, D. J.; Vaidyanathan, A. G.; van Stekelenborg, J. Proc. Natl. Acad. Sci. U.S.A., submitted.

OCR for page 71
176 APPENDIX D tionships between data rather than relying on human interpretation to generate a model as a starting point; this is a significant advantage. Another important ad- vantage of PD is that it builds models based on ensembles of inputs to explain the data and therefore has an advantage in the analysis of complex systems (such as biology2 3~. We have developed a novel approach to PD that has been applied to big-sequence, chemistry, and genomic data. Importantly, these methods can be used to integrate different data types such as those found in chemistry and biol- ogy. PD methods are quite general and can be applied to many different areas such as proteomics, text, etc. Validation of these methods in big-sequence space has been completed using well-defined and well-understood systems such as serine proteasesi3 and kineses. PD in big-sequence space provides a method for finding ensembles of patterns of residues that form a powerful description of the sequences studied. The similarity between patterns expresses the evolutionary family relationships. The differences between patterns define their divergence. The patterns express key functional and structural motifs that very much define the familial and biochemical character of the proteins. Embedded in the patterns are also key residues that have particular importance with respect to the function or the structure of the protein. Mapping these patterns onto the x-ray structures of serine proteases and kineses indicates that the patterns indeed are structurally and functionally important, and further, that they define the substrate-binding domain of the proteins. This leads to the compelling conclusion that since the patterns describe evolutionary changes (di- vergence and convergence) and also describe the critical features of substrate binding, the substrate is the driver of evolutionary change.~3 A particular application of PD is in the analysis of variations of genetic infor- mation (single nucleotide polymorphisms, or SNPs). Analysis of SNPs can lead to the identification of genetic causes of diseases, or inherited traits that deter- mine differences in the way humans respond to drugs (either adversely or posi- tively). Until now, the main method of SNP analysis has been linkage disequilib- rium (LD), which seeks to determine correlations among specific SNPs. A key limitation of LD however is that only a restricted set of SNPs can be compared. Typically SNPs within a local region of a chromosome or SNPs within genes that are thought to act together are compared. PD on the other hand, through its unique computational approach, is capable of detecting all patterns of SNPs, regardless of the genomic distances between them. Among these will be patterns of SNPs that are responsible for the disease (or trait) of interest, even though the indi- vidual SNPs comprising the pattern may have no detectable disease (or trait) correlation. This capability will greatly accelerate the exploitation of the genome for commercial purposes.