tems are cost effective in many applications, including some of the most demanding ones.
However, the design of commodity processors is driven by the needs of commercial data processing or personal computing; such processors are not optimized for scientific computing. The Linpack benchmark that is used to rank systems in the TOP500 list is representative of supercomputing applications that do not need high memory bandwidth (because caches work well) and do not need high global communication bandwidth. Such applications run well on commodity clusters. Many important applications need better local memory bandwidth and lower apparent latency (i.e., better latency hiding), as well as better global bandwidth and latency. Technologies for better bandwidth and latency exist. Better local memory bandwidth and latency are only available in custom processors. Better global bandwidth and latency are only available in custom interconnects with custom interfaces. The availability of local and global high bandwidth and low latency improves the performance of the many codes that leverage only a small fraction of the peak performance of commodity systems because of bottlenecks in access to local and remote memories. The availability of local and global high bandwidth can also simplify programming, because less programmer time needs to be spent in tuning memory access and communication patterns, and simpler programming models can be used. Furthermore, since memory access time is not scaling at the same rate as processor speed, more commodity cluster users will become handicapped by low effective memory bandwidth. Although increased performance must be weighed against increased cost, there are some applications that cannot achieve the needed turnaround time without custom technology.
Conclusion: Commodity clusters satisfy the needs of many supercomputer users. However, some important applications need the better main memory bandwidth and latency hiding that are available only in custom supercomputers; many need the better global bandwidth and latency interconnects that are available only in custom or hybrid supercomputers; and most would benefit from the simpler programming model that can be supported well on custom systems. The increasing gap between processor speed and communication latencies is likely to increase the fraction of supercomputing applications that achieve acceptable performance only on custom and hybrid supercomputers.
Supercomputing systems consist not only of hardware but also of software. There are unmet needs in supercomputing software at all levels, from the operating system to the algorithms to the application-specific software. These unmet needs stem from both technical difficulties and