Computational materials science presents many problems, some of which relate to parallel computation in unique ways. Within the field there is a hierarchy of time and length scales extending from the most accurate and elaborate treatments of electronic correlations through local-density and molecular dynamics calculations to continuum theories of macroscopic materials. Additionally, each of these elements often offers a variety of formulational and algorithmic approaches, such as plane-wave and real-space formulations of local-density calculations.
The potential for effective parallelization of each of these component problems involves three principal factors: (1) the interprocessor communication requirements of algorithmic alternatives; (2) the exploitation of the enormous memories provided by parallel computers; and (3) the exploitation of existing serial code components, often highly optimized and representing large investments of time and effort. The importance of communication requirements stems from the fact that the time required to retrieve data from the memory of another processor is very much longer than from local memory. Some algorithms, such as the fast Fourier transform, require extensive interprocessor communication. Such communication can often be hidden, that is, caused to occur simultaneously with productive computation. Thus, it becomes very important to exploit, wherever possible, mathematical and communications routines optimized for specific machines by their vendor.
The very large memories often available with parallel supercomputers permit the replication of data, thereby permitting local retrieval during execution. For example, if the individual states of an electronic-structure calculation are evolved by different processors, replication of the effective potential permits the states to be evolved without additional communication. This example can also be used to illustrate the value of executing optimized serial code in individual nodes. If individual states are evolved in individual nodes, the implied fast Fourier transforms and any optimizations of the serial evolution code are preserved. Such a conversion provides a relatively simple approach to parallelization, although it will not yield ultimate performance and scalability.
Formulational alternatives can also differ in their amenability to parallel execution. For example, the description of long-range forces in molecular dynamics simulations using logarithmic tree algorithms reduces the computational scaling from N2 to N log (N). The effectiveness of these methods has been extensively demonstrated in the context of gravitational forces in galaxy modeling. Their use for the description of Coulomb interactions is being developed. This scaling improvement is, in principle, independent of parallel execution; however, the system-size crossover beyond which such methods are advantageous often calls for parallel computers.
Scaling and system-size crossover enter similarly into the discussion of “order-N” methods of electronic structure. There has been much progress on this front, including the development of a variety of alternatives within this context. Application of the Lanczos method to the kinetic energy operator on a mesh, density-matrix methods, and methods restricting the optimization of the occupied subspace are all presently under study, and all lend themselves to parallel execution.
In the context of Monte Carlo calculations, the large systems calling for parallel execution focus attention on the selection of cluster moves. When such a system is distributed over the nodes of a parallel computer, identification of effective cluster moves becomes a special challenge. Significant progress has been made in this area.