What Makes Large-Scale IT Systems So Difficult to Design, Build, and Operate?

  • Large number of components—Large IT systems can contain thousands of processors and hundreds of thousands or even millions of lines of software. Research is needed to understand how to build systems that can scale gracefully and add capacity as needed without needing overall redesign.

  • Deep interactions among components—Components of large IT systems interact with each other in a variety of ways, some of which may not have been anticipated by the designers. A single misbehaving router can flood the Internet with traffic that will bring down thousands of local hosts and cause traffic to be rerouted worldwide. Research is needed to provide better analytical techniques for modeling system performance and building systems with more comprehensible structures.

  • Unintended and unanticipated consequences of changes or additions to the systems—For instance, upgrading the memory in a personal computer can lead to timing mismatches that cause memory failures that in turn lead to loss of application data, even if the memory chips are themselves perfectly functional. In this case it is the system that fails to work, even though all its components work. Research is needed to uncover techniques or architectures that provide greater flexibility.

  • Emergent behaviors—Systems sometimes exhibit surprising behaviors that arise from unanticipated interactions among components. These behaviors are “emergent” in that they are unspecified by any individual component and are the unanticipated product of the system as a whole. Research is needed to find techniques for better analyzing system behavior.

  • Constantly changing needs of the users—Many large systems are longlived, meaning they must be modified while preserving some of their own capabilities and within the constraints of the performance of individual components. Development cycles can be so long that requirements change before systems are even deployed. Research is needed to develop ways of building extendable systems that can accommodate change.

  • Independently designed components—Today's large-scale IT systems are not typically designed from the top down but often are assembled from off-the-shelf components. These components have not been customized to work in the larger system and must rely on standard interfaces and, often, customized software. Modern IT systems are essentially assembled in each home or office. As a result, they are notoriously difficult to maintain and subject to frequent, unexplained breakdowns. Research could help to develop architectural approaches that can accommodate heterogeneity and to extend the principles of modularity to larger scales than have been attempted to date.

  • Large numbers of individuals involved in design and operation—When browsing the Internet, a user may interact with thousands of computers and hundreds of different software components, all designed by independent teams of designers. For that browsing to work, all of these designs must work sufficiently

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement