C
Content of Network Science Courses
Appendix C offers a detailed account of the results of the survey the committee conducted on the core materials taught in network science courses. To achieve its goal, the committee followed a two-step procedure. First, it searched for currently taught courses on networks, looking in all possible departments, from computer science and physics to the social and biological sciences. A representative list of such courses is provided in Table C-1. The committee’s work was facilitated by the fact that many courses post a detailed syllabus on the Web, as well as links to other courses on the same topic. If a syllabus was not available on the Web site, a copy was requested from the instructor. After inspecting the collected syllabi, it was possible to discern a set of core concepts that are shared by a wide range of courses and application areas that are often common only within a specific field. The set of concepts was then shared with a number of researchers and educators who are involved in research related to network science or who teach related courses. Based on the input provided by these individuals, the committee created the survey of core material in this appendix.
The committee faced a significant challenge as it sought to synthesize the diverse body of material taught in a wide range of disciplines. Indeed, as discussed in Chapter 2, network science is called on to address problems that not only cut across disciplines but also represent a vast body of knowledge, from infrastructure networks, such as the power grid and the Internet, to pervasive applications running over these infrastructure networks, such as the World Wide Web (www); from networks of gene and protein interactions to social and economic networks. The board range of these networks is illustrated in Table C-2.
In what sense can the study of such diverse subjects examples fall under a single unifying domain? The realization that despite their diversity, most real networks are characterized by a small set of organizing principles helped answer this question. For example, the widespread emergence of scale-free networks is rooted in the role of growth and preferential attachment, mechanisms present in many real systems, from cell biology to computer science. This implies that a unified set of tools can be applied to characterize the properties and behavior of a wide range of real networks. For example, tools developed by mathematicians to understand random networks or measures introduced by sociologists to explore social networks can be applied by biologists to design new drugs for disrupting the metabolic network or by computer scientists to explore the properties of the World Wide Web or the Internet. These generic or universal features of real networks and tools are reflected in the courses that are currently taught. Despite their diversity, most courses cover basic concepts that appear to be common across disciplines. The role of this appendix is to survey these common concepts, tools, and methods, identifying the material at the core of network science.
OVERALL ORGANIZATION
Even in established fields of science there is significant room for diversity of focus and interpretation. For example, a survey of basic courses on, say, quantum mechanics or economics would show that while there is broad agreement across instructors, textbooks, and universities on a small set of topics that must be covered, there are significant variations in the examples used, the application areas covered, and special topics. The committee shows that this is eminently the case for network-science-related courses as well. Broadly speaking, one can identify a set of core concepts that emerges in a wide range of courses in network science, largely independent of their discipline. These core concepts are typically embedded into applications with different focus. Applications can be grouped into three main areas. The first and the most dynamically evolving area, biological networks, applies network theory to subcellular (metabolic, regulatory, genetic) networks, neural networks, and ecological networks like food webs and species interactions. The second area, social and economic networks, encompasses a wide range of topics such as social interactions, collabora-
TABLE C-1 Representative Courses on Computer Science
Courses |
Institution |
Name of Course |
Web Address |
Core courses |
Pennsylvania State University |
Graphs and Networks in Systems Biology |
|
University of Michigan, Ann Arbor |
Network Theory |
http://www-personal.umich.edu/~mejn/courses/2004/cscs535/index.html |
|
Columbia University |
Networks and Complexity in Social Systems |
||
Columbia University |
Scaling in Networks |
http://comet.columbia.edu/courses/elen_e9701/2001/outline.html |
|
University of California at Irvine |
Networks and Complexity |
http://eclectic.ss.uci.edu/~drwhite/Anthro179a/SocialDynamics02.html |
|
University of Pennsylvania |
Networked Life |
||
University of Indiana at Bloomington |
Structural Data Mining |
||
University of Patras (Greece) |
Networks |
http://nicomedia.math.upatras.gr/courses/mnets/index_en.html |
|
Applications and related courses |
University of Michigan, Ann Arbor |
Information Retrieval |
|
Cornell University |
Structure of Information Networks |
||
Massachusetts Institute of Technology |
Complex Human Networks Reading Group |
||
Virginia Tech |
Recommender Systems |
http://people.cs.vt.edu/~ramakris/Courses/CS6604-RS/outline.html |
|
Boston College |
Social Network Analysis |
||
University of Toronto |
Social Network Analysis |
tions, social filtering, and economic alliances. The third application area, infrastructure and communication networks, ranges from the Internet and the World Wide Web to power grids, phone networks, and sensor networks.
In the following sections the committee describes in some detail the core concepts, followed by a short discussion of the three application areas. Table C-3 lists some core material that could be expected in network science courses.
NETWORK STRUCTURE
The elementary attributes of all networks are the nodes, which are the basic units of a system, and the links, which are the connections between the nodes. Both the nodes and the links can widely differ in different fields. For example, the nodes might be humans or scientists in social networks; molecules, genes, or neurons in biology; routers or transformers in infrastructural networks; and Web pages or research publications in information networks. Similarly, the links might be friendships, alliances, reactions, synapses, optical and copper cables, URLs, or citations. The totality of the nodes and the links defines a network, often represented in graphic form as a connectivity matrix, telling us which nodes are directly connected to each other. Given that the study of networks was traditionally part of graph theory, a branch of mathematics with long and distinguished history, most of the language that network theory uses today has its roots in graph theory. A network map (or connectivity matrix) is typically the starting point for characterizing the structure or topology of any network.
Once a network has been mapped, the first priority is to characterize its topological and structural features. Degree of connectivity represents the most elementary measure of a node, specifying the number of links a node has to other nodes. Much can be learned about a network by inspecting the degree distribution, which in its simplest manifestation is a histogram of the number of nodes with a given degree. Other important measures include the shortest path between two nodes, which plays a key role in identifying small-world effects; the diameter, which is the distance between the two most distant nodes; the subgraphs and communities that characterize the relationship between small subsets of nodes within a network; the spectral properties, which help us capture a series of local and global network characteristics; and,
TABLE C-2 Real-World Networks Appearing in Courses
Discipline/Course |
Type of Network |
Infrastructure and communications networks |
Power grid Internet Public switched telephone network |
Information and content distribution networks |
World Wide Web Broadcast Sensors Search |
Social networks |
Collaboration Communities Social filtering and recommendations Economic Linguistic |
Computing networks |
Neural nets Petri nets Cellular automata Interacting intelligent agents |
Engineering systems |
Control networks Integrated circuits Queuing networks Process networks Transportation networks Supply chains and manufacturing |
Research networks |
Scientific grid Collaborations Blogs and online journals |
Military networks |
Terrorist networks Intelligence networks Logistics networks |
Biological networks |
Metabolism Gene and protein interactions Biomanufacturing Regulatory and control networks Ecological networks and food webs Viruses and epidemics |
finally, the link strength or weight, which characterizes the nature of the interactions between different nodes.
Based on these measures, real networks may be classified in perhaps two or three major classes. First, there is a class known as regular networks, or graphs, in which the degree of all the nodes assumes the same value or only a few discrete values and the underlying network has a regular, repetitive structure. Such regular graphs approximate the structure of most crystals, as well as a number of other objects, engineered and natural, from the retina of the eye to the roads of some large cities (like New York). Much attention, however, has focused on random networks, systems in which the nodes are randomly connected to each other. In such networks the degrees follows a Poisson distribution. Despite their important role in network theory, we do not know of major real networks that would be fully random. Finally, the availability of large-scale network maps has led to the discovery that many real networks are neither regular nor fully random but, rather, scale-free. They have a heavily tailed degree distribution—that is, there are significant (order of magnitude) differences in the degree of different nodes. Scale-free networks describe the cell, the Web, the Internet, and many collaboration and social and economic networks. While many real networks are intermediate between these three classes, this classification captures some of the basic primitives used in many courses on networks and most networks are characterized in terms of the three classes.
An important question surfacing in many network-science-related courses is the following: What processes and mechanisms give rise to the network characteristics discussed in the preceding section? A closely related question is this: How do we generate networks with structural characteristics that mimic the properties of selected real networks? Network models, introduced to answer these two questions, are an important part of most network science courses (see Table C-4). These models have two main functions. First, some models aim to mimic, in a simplified form, the emergence and evolution of real networks, helping us to understand the mechanism responsible for the formation of real networks. Second, to test the impact of selected network characteristics on the network’s behavior, we need to gener-
TABLE C-3 Content of a Typical Network Science Course
Subject |
Content |
Core concepts |
Real-world networks Characterization and classifying networks and their components Network modeling |
Network interpretation and processes |
Flow and routing Aggregation and growth Communication and coordination |
Behavior: networks as dynamic entities |
Performance and scaling Robustness Routing and congestion Disruptability |
Engineering methods in network science |
Network design Network analysis |
Applications of network science |
Information and communication network Biological networks Social networks Control and mechanical systems Industrial applications Military applications |
TABLE C-4 Network Models Commonly Used to Generate Network Topologies and Analytical Tools Used to Characterize and Study the Properties of Models
Network Models |
Analytical Tools |
Random networks Erdos-Renyi model Percolation based Scale-free models Growth and preferential attachment Static models Optimization Static models Small-world model Optimized topologies |
Exact methods Discrete math Combinatorics Graph theory Dynamical systems Master and rate equations Mean field theory Generating functions Stochastic networks Statistical mechanics Agent-based models Clustering tools |
ate synthetic networks with preset properties. Given these needs, network modeling is one of the most highly studied components of network science.
Historically the most studied model is the random network model explored by Erdos and Renyi, which generates networks by placing the links randomly between the nodes. Random networks represent an important reference frame in network modeling. Despite the fact that we do not know of real networks that are captured by this model, given that most of its characteristics can be calculated exactly, it represents an important theoretical tool against which various real networks and hypotheses can be tested. While scale-free network models represent a recent addition to the network literature, given that many real networks of interest, from biology to computer science, have scale-free characteristics, in the past few years they have become the most investigated class of model. Several models are available to generate scale-free networks. The most studied one was introduced by Barabási and Albert (1999) to capture the formation of a real network. It involves two main ingredients, growth and preferential attachment. Given the diversity of networks, a whole class of models has been introduced that incorporate other mechanisms affecting a network’s evolution, from fitness to node and link removal and aging. In addition, a number of models do not capture the network formation process but result in scale-free topologies through either a fitness-driven or optimization procedure. Despite the important theoretical role these models play, there is no clear evidence that real networks would be shaped by these processes. The small world model, introduced by Watts and Strogatz (1998), interpolates between regular and random networks and has also generated significant theoretical interest.
Understanding the properties of these networks requires a number of analytical tools that have been developed by a number of fields, from discrete mathematics to statistical mechanics. The study of random networks has a long history, using exact methods developed in graph theory, combinatorics, probability theory, and stochastic processes. Scale-free network models, which represent graphs that change in time, are typically studied using methods based on rate and master equations capable of precisely predicting the degree distribution and other characteristics of scale-free networks. In some cases mathematicians have employed exact methods and the tools of dynamical systems and percolation theory to obtain exact results for these networks. Optimization and genetic algorithms have been used to generate optimized networks. Finally, in order to identify communities and groups in networks there has been a cross-disciplinary interest in developing network clustering methods.
NETWORK DYNAMICS
The specification of the dynamics that characterize a network is less straightforward because these dynamics tend to be rather different in the various applications areas. As mentioned in Chapter 3, the examples range from phase transitions in physical systems, to chemical reactions governed by rate equations, to sociological interactions between people. Here we develop just one example from the computer science discipline.
Computers in a network must often be able to broadcast messages to all other computers in the network. The simplistic protocol would require that each computer maintain a view of the addresses of all computers in the entire network. If computers are constantly leaving or entering the network, the significant problem arises of updating and maintaining a consistent view of the network at each computer. This updating of the views prevented early systems from scaling in size. The solution was for each computer to maintain only a small partial view of the network. The broadcast protocol is for a computer to broadcast to each computer in its view and have computers receiving a broadcast retransmit the message to computers in their views. These more sophisticated protocols that are necessary to overcome scaling problems have lead to sophisticated mathematical techniques and algorithms (Demers et al., 1987). This is just one example of many directions in which basic theory in network science is emerging in the computer science area and appearing in advanced courses.
NETWORK FUNCTION
The purpose of most networks is to transport information, people, or material. These functions often determine both the structure and the dynamics of real networks. Therefore, the understanding of network function has been a very active area in a number of research fields, and it is reflected in most
network courses. The high degree of universality and commonality that is present in network structure largely disappears when it comes to network function, thanks to the diversity of the functions that networks assume in different domains. For example, the purpose of the Internet is to transfer information in an efficient manner, in contrast to the purpose of the metabolic network, which is to process the chemicals consumed by the cell and to turn them into the cell’s building blocks. Therefore, on the top of similar topologies one can define a wide range of dynamical rules, from flow to diffusion and contagion, that lead to different functions and network behavior. Despite this diversity, a number of common and highly studied themes have emerged in the past few years.
One of the most studied themes focuses on diffusion on networks. Indeed, networks support a wide range of diffusive processes that can cause serious problems in a number of application areas. Social and informational networks are responsible for the diffusion of ideas. Sexual and contact networks are responsible for the spread of infectious diseases and viruses, ranging from acquired immunodeficiency syndrome (AIDS) to influenza to severe acute respiratory syndrome (SARS). Computer and e-mail networks are responsible for the spread of electronic viruses and worms, generating billions of dollars in damages. Finally, social and professional networks are responsible for the spread of innovations, ideas, and rumors, playing a key economic and social role. It turns out that the numerical and analytical tools used by different fields to approach these problems are highly similar. Yet, in the past few years, with the increasing understanding of the topology of sexual contact networks and computer networks, we have witnessed significant paradigm shifts. For example, the discovery that in scale-free networks diseases do not experience an epidemic threshold has had a significant impact on the strategies used for epidemic modeling and on the design of efficient interventions. While these processes are discussed from the perspective of different fields, they are widely covered in a wide range of courses, from biology to business to computer science.
Network flows, describing mostly the situation when something tangible flows from source to destination, are another class of widely studied multidisciplinary problems. They emerge in the Internet as the flow of bits along the physical infrastructure, in the study of metabolic networks as the flux of matter across reactions, or in transportation networks as, for example, the flow of cars on the highway. In general, the network structure canalizes these flow processes and to a high degree determines the flow rates and the necessary capacities on each link and node.
Another much-studied interdisciplinary problem is a network’s ability to carry on its functions in the face of errors and failures. Centered on the question of robustness and resilience, many of these studies explore the network’s dynamical and topological integrity under node and link loss (Dodds et al., 2003). A number of studies have shown that there is a strong interplay between a network’s robustness and its structure. For example, scale-free networks are very robust under random node removal but highly vulnerable to the systematic removal of their hubs. The situation becomes even more complex if network flows are considered, which could lead to cascading failures, such as the 2003 Northeast electricity blackout, affecting millions of consumers (Watts, 2002). But robustness studies play a key role in designing new drugs or in developing systems and network topologies that are highly error resistant.
Another class of much-studied problems focuses on search in networks, leading to algorithms and methods to efficiently locate information in complex networked structures. These studies play a key role in a wide range of problems, from Web search algorithms to the identification of information and expertise in an organization.
REFERENCES
Barabási, A.L., and R. Albert. 1999. Emergence of scaling in random networks. Science 286(5439): 509–512.
Demers, A., D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. 1987. Epidemic algorithms for replicated database maintenance . In Proceedings of the 6th ACM Symposium on Principles of Distributed Computing. Vancouver, B.C., Canada.
Dodds, P.S., D.J. Watts, and C.F. Sabel. 2003. Information exchange and the robustness of organizational networks. Proceedings of the National Academy of Sciences of the United States of America 100(21): 12516–12521.
Watts, D.J. 2002. A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences of the United States of America 99: 12516–12521.
Watts, D.J., and S. Strogatz. 1998. Collective dynamics of “small-world” networks. Nature 393(6684): 440–442.