| ||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 60
Network Science
C
Content of Network Science Courses
Appendix C offers a detailed account of the results of the survey the committee conducted on the core materials taught in network science courses. To achieve its goal, the committee followed a two-step procedure. First, it searched for currently taught courses on networks, looking in all possible departments, from computer science and physics to the social and biological sciences. A representative list of such courses is provided in Table C-1. The committee’s work was facilitated by the fact that many courses post a detailed syllabus on the Web, as well as links to other courses on the same topic. If a syllabus was not available on the Web site, a copy was requested from the instructor. After inspecting the collected syllabi, it was possible to discern a set of core concepts that are shared by a wide range of courses and application areas that are often common only within a specific field. The set of concepts was then shared with a number of researchers and educators who are involved in research related to network science or who teach related courses. Based on the input provided by these individuals, the committee created the survey of core material in this appendix.
The committee faced a significant challenge as it sought to synthesize the diverse body of material taught in a wide range of disciplines. Indeed, as discussed in Chapter 2, network science is called on to address problems that not only cut across disciplines but also represent a vast body of knowledge, from infrastructure networks, such as the power grid and the Internet, to pervasive applications running over these infrastructure networks, such as the World Wide Web (www); from networks of gene and protein interactions to social and economic networks. The board range of these networks is illustrated in Table C-2.
In what sense can the study of such diverse subjects examples fall under a single unifying domain? The realization that despite their diversity, most real networks are characterized by a small set of organizing principles helped answer this question. For example, the widespread emergence of scale-free networks is rooted in the role of growth and preferential attachment, mechanisms present in many real systems, from cell biology to computer science. This implies that a unified set of tools can be applied to characterize the properties and behavior of a wide range of real networks. For example, tools developed by mathematicians to understand random networks or measures introduced by sociologists to explore social networks can be applied by biologists to design new drugs for disrupting the metabolic network or by computer scientists to explore the properties of the World Wide Web or the Internet. These generic or universal features of real networks and tools are reflected in the courses that are currently taught. Despite their diversity, most courses cover basic concepts that appear to be common across disciplines. The role of this appendix is to survey these common concepts, tools, and methods, identifying the material at the core of network science.
OVERALL ORGANIZATION
Even in established fields of science there is significant room for diversity of focus and interpretation. For example, a survey of basic courses on, say, quantum mechanics or economics would show that while there is broad agreement across instructors, textbooks, and universities on a small set of topics that must be covered, there are significant variations in the examples used, the application areas covered, and special topics. The committee shows that this is eminently the case for network-science-related courses as well. Broadly speaking, one can identify a set of core concepts that emerges in a wide range of courses in network science, largely independent of their discipline. These core concepts are typically embedded into applications with different focus. Applications can be grouped into three main areas. The first and the most dynamically evolving area, biological networks, applies network theory to subcellular (metabolic, regulatory, genetic) networks, neural networks, and ecological networks like food webs and species interactions. The second area, social and economic networks, encompasses a wide range of topics such as social interactions, collabora-
OCR for page 61
Network Science
TABLE C-1 Representative Courses on Computer Science
Courses
Institution
Name of Course
Web Address
Core courses
Pennsylvania State University
Graphs and Networks in Systems Biology
http://www.phys.psu.edu/~ralbert/phys597/
University of Michigan, Ann Arbor
Network Theory
http://www-personal.umich.edu/~mejn/courses/2004/cscs535/index.html
Columbia University
Networks and Complexity in Social Systems
http://www.columbia.edu/itc/sociology/watts/w3233/
Columbia University
Scaling in Networks
http://comet.columbia.edu/courses/elen_e9701/2001/outline.html
University of California at Irvine
Networks and Complexity
http://eclectic.ss.uci.edu/~drwhite/Anthro179a/SocialDynamics02.html
University of Pennsylvania
Networked Life
http://www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/
University of Indiana at Bloomington
Structural Data Mining
http://ella.slis.indiana.edu/~katy/L597/
University of Patras (Greece)
Networks
http://nicomedia.math.upatras.gr/courses/mnets/index_en.html
Applications and related courses
University of Michigan, Ann Arbor
Information Retrieval
http://tangra.si.umich.edu/~radev/650/
Cornell University
Structure of Information Networks
http://www.cs.cornell.edu/Courses/cs685/2002fa/
Massachusetts Institute of Technology
Complex Human Networks Reading Group
http://web.media.mit.edu/~tanzeem/cohn/CoHN.htm
Virginia Tech
Recommender Systems
http://people.cs.vt.edu/~ramakris/Courses/CS6604-RS/outline.html
Boston College
Social Network Analysis
http://www.analytictech.com/essex/schedule.htm
University of Toronto
Social Network Analysis
http://www.chass.utoronto.ca/~wellman/courses/gradnet01.htm
tions, social filtering, and economic alliances. The third application area, infrastructure and communication networks, ranges from the Internet and the World Wide Web to power grids, phone networks, and sensor networks.
In the following sections the committee describes in some detail the core concepts, followed by a short discussion of the three application areas. Table C-3 lists some core material that could be expected in network science courses.
NETWORK STRUCTURE
The elementary attributes of all networks are the nodes, which are the basic units of a system, and the links, which are the connections between the nodes. Both the nodes and the links can widely differ in different fields. For example, the nodes might be humans or scientists in social networks; molecules, genes, or neurons in biology; routers or transformers in infrastructural networks; and Web pages or research publications in information networks. Similarly, the links might be friendships, alliances, reactions, synapses, optical and copper cables, URLs, or citations. The totality of the nodes and the links defines a network, often represented in graphic form as a connectivity matrix, telling us which nodes are directly connected to each other. Given that the study of networks was traditionally part of graph theory, a branch of mathematics with long and distinguished history, most of the language that network theory uses today has its roots in graph theory. A network map (or connectivity matrix) is typically the starting point for characterizing the structure or topology of any network.
Once a network has been mapped, the first priority is to characterize its topological and structural features. Degree of connectivity represents the most elementary measure of a node, specifying the number of links a node has to other nodes. Much can be learned about a network by inspecting the degree distribution, which in its simplest manifestation is a histogram of the number of nodes with a given degree. Other important measures include the shortest path between two nodes, which plays a key role in identifying small-world effects; the diameter, which is the distance between the two most distant nodes; the subgraphs and communities that characterize the relationship between small subsets of nodes within a network; the spectral properties, which help us capture a series of local and global network characteristics; and,
OCR for page 62
Network Science
TABLE C-2 Real-World Networks Appearing in Courses
Discipline/Course
Type of Network
Infrastructure and communications networks
Power grid
Internet
Public switched telephone network
Information and content distribution networks
World Wide Web
Broadcast
Sensors
Search
Social networks
Collaboration
Communities
Social filtering and recommendations
Economic
Linguistic
Computing networks
Neural nets
Petri nets
Cellular automata
Interacting intelligent agents
Engineering systems
Control networks
Integrated circuits
Queuing networks
Process networks
Transportation networks
Supply chains and manufacturing
Research networks
Scientific grid
Collaborations
Blogs and online journals
Military networks
Terrorist networks
Intelligence networks
Logistics networks
Biological networks
Metabolism
Gene and protein interactions
Biomanufacturing
Regulatory and control networks
Ecological networks and food webs
Viruses and epidemics
finally, the link strength or weight, which characterizes the nature of the interactions between different nodes.
Based on these measures, real networks may be classified in perhaps two or three major classes. First, there is a class known as regular networks, or graphs, in which the degree of all the nodes assumes the same value or only a few discrete values and the underlying network has a regular, repetitive structure. Such regular graphs approximate the structure of most crystals, as well as a number of other objects, engineered and natural, from the retina of the eye to the roads of some large cities (like New York). Much attention, however, has focused on random networks, systems in which the nodes are randomly connected to each other. In such networks the degrees follows a Poisson distribution. Despite their important role in network theory, we do not know of major real networks that would be fully random. Finally, the availability of large-scale network maps has led to the discovery that many real networks are neither regular nor fully random but, rather, scale-free. They have a heavily tailed degree distribution—that is, there are significant (order of magnitude) differences in the degree of different nodes. Scale-free networks describe the cell, the Web, the Internet, and many collaboration and social and economic networks. While many real networks are intermediate between these three classes, this classification captures some of the basic primitives used in many courses on networks and most networks are characterized in terms of the three classes.
An important question surfacing in many network-science-related courses is the following: What processes and mechanisms give rise to the network characteristics discussed in the preceding section? A closely related question is this: How do we generate networks with structural characteristics that mimic the properties of selected real networks? Network models, introduced to answer these two questions, are an important part of most network science courses (see Table C-4). These models have two main functions. First, some models aim to mimic, in a simplified form, the emergence and evolution of real networks, helping us to understand the mechanism responsible for the formation of real networks. Second, to test the impact of selected network characteristics on the network’s behavior, we need to gener-
TABLE C-3 Content of a Typical Network Science Course
Subject
Content
Core concepts
Real-world networks
Characterization and classifying networks and their components
Network modeling
Network interpretation and processes
Flow and routing
Aggregation and growth
Communication and coordination
Behavior: networks as dynamic entities
Performance and scaling
Robustness
Routing and congestion
Disruptability
Engineering methods in network science
Network design
Network analysis
Applications of network science
Information and communication network
Biological networks
Social networks
Control and mechanical systems
Industrial applications
Military applications
OCR for page 63
Network Science
TABLE C-4 Network Models Commonly Used to Generate Network Topologies and Analytical Tools Used to Characterize and Study the Properties of Models
Network Models
Analytical Tools
Random networks
Erdos-Renyi model
Percolation based
Scale-free models
Growth and preferential attachment
Static models
Optimization
Static models
Small-world model
Optimized topologies
Exact methods
Discrete math
Combinatorics
Graph theory
Dynamical systems
Master and rate equations
Mean field theory
Generating functions
Stochastic networks
Statistical mechanics
Agent-based models
Clustering tools
ate synthetic networks with preset properties. Given these needs, network modeling is one of the most highly studied components of network science.
Historically the most studied model is the random network model explored by Erdos and Renyi, which generates networks by placing the links randomly between the nodes. Random networks represent an important reference frame in network modeling. Despite the fact that we do not know of real networks that are captured by this model, given that most of its characteristics can be calculated exactly, it represents an important theoretical tool against which various real networks and hypotheses can be tested. While scale-free network models represent a recent addition to the network literature, given that many real networks of interest, from biology to computer science, have scale-free characteristics, in the past few years they have become the most investigated class of model. Several models are available to generate scale-free networks. The most studied one was introduced by Barabási and Albert (1999) to capture the formation of a real network. It involves two main ingredients, growth and preferential attachment. Given the diversity of networks, a whole class of models has been introduced that incorporate other mechanisms affecting a network’s evolution, from fitness to node and link removal and aging. In addition, a number of models do not capture the network formation process but result in scale-free topologies through either a fitness-driven or optimization procedure. Despite the important theoretical role these models play, there is no clear evidence that real networks would be shaped by these processes. The small world model, introduced by Watts and Strogatz (1998), interpolates between regular and random networks and has also generated significant theoretical interest.
Understanding the properties of these networks requires a number of analytical tools that have been developed by a number of fields, from discrete mathematics to statistical mechanics. The study of random networks has a long history, using exact methods developed in graph theory, combinatorics, probability theory, and stochastic processes. Scale-free network models, which represent graphs that change in time, are typically studied using methods based on rate and master equations capable of precisely predicting the degree distribution and other characteristics of scale-free networks. In some cases mathematicians have employed exact methods and the tools of dynamical systems and percolation theory to obtain exact results for these networks. Optimization and genetic algorithms have been used to generate optimized networks. Finally, in order to identify communities and groups in networks there has been a cross-disciplinary interest in developing network clustering methods.
NETWORK DYNAMICS
The specification of the dynamics that characterize a network is less straightforward because these dynamics tend to be rather different in the various applications areas. As mentioned in Chapter 3, the examples range from phase transitions in physical systems, to chemical reactions governed by rate equations, to sociological interactions between people. Here we develop just one example from the computer science discipline.
Computers in a network must often be able to broadcast messages to all other computers in the network. The simplistic protocol would require that each computer maintain a view of the addresses of all computers in the entire network. If computers are constantly leaving or entering the network, the significant problem arises of updating and maintaining a consistent view of the network at each computer. This updating of the views prevented early systems from scaling in size. The solution was for each computer to maintain only a small partial view of the network. The broadcast protocol is for a computer to broadcast to each computer in its view and have computers receiving a broadcast retransmit the message to computers in their views. These more sophisticated protocols that are necessary to overcome scaling problems have lead to sophisticated mathematical techniques and algorithms (Demers et al., 1987). This is just one example of many directions in which basic theory in network science is emerging in the computer science area and appearing in advanced courses.
NETWORK FUNCTION
The purpose of most networks is to transport information, people, or material. These functions often determine both the structure and the dynamics of real networks. Therefore, the understanding of network function has been a very active area in a number of research fields, and it is reflected in most
OCR for page 64
Network Science
network courses. The high degree of universality and commonality that is present in network structure largely disappears when it comes to network function, thanks to the diversity of the functions that networks assume in different domains. For example, the purpose of the Internet is to transfer information in an efficient manner, in contrast to the purpose of the metabolic network, which is to process the chemicals consumed by the cell and to turn them into the cell’s building blocks. Therefore, on the top of similar topologies one can define a wide range of dynamical rules, from flow to diffusion and contagion, that lead to different functions and network behavior. Despite this diversity, a number of common and highly studied themes have emerged in the past few years.
One of the most studied themes focuses on diffusion on networks. Indeed, networks support a wide range of diffusive processes that can cause serious problems in a number of application areas. Social and informational networks are responsible for the diffusion of ideas. Sexual and contact networks are responsible for the spread of infectious diseases and viruses, ranging from acquired immunodeficiency syndrome (AIDS) to influenza to severe acute respiratory syndrome (SARS). Computer and e-mail networks are responsible for the spread of electronic viruses and worms, generating billions of dollars in damages. Finally, social and professional networks are responsible for the spread of innovations, ideas, and rumors, playing a key economic and social role. It turns out that the numerical and analytical tools used by different fields to approach these problems are highly similar. Yet, in the past few years, with the increasing understanding of the topology of sexual contact networks and computer networks, we have witnessed significant paradigm shifts. For example, the discovery that in scale-free networks diseases do not experience an epidemic threshold has had a significant impact on the strategies used for epidemic modeling and on the design of efficient interventions. While these processes are discussed from the perspective of different fields, they are widely covered in a wide range of courses, from biology to business to computer science.
Network flows, describing mostly the situation when something tangible flows from source to destination, are another class of widely studied multidisciplinary problems. They emerge in the Internet as the flow of bits along the physical infrastructure, in the study of metabolic networks as the flux of matter across reactions, or in transportation networks as, for example, the flow of cars on the highway. In general, the network structure canalizes these flow processes and to a high degree determines the flow rates and the necessary capacities on each link and node.
Another much-studied interdisciplinary problem is a network’s ability to carry on its functions in the face of errors and failures. Centered on the question of robustness and resilience, many of these studies explore the network’s dynamical and topological integrity under node and link loss (Dodds et al., 2003). A number of studies have shown that there is a strong interplay between a network’s robustness and its structure. For example, scale-free networks are very robust under random node removal but highly vulnerable to the systematic removal of their hubs. The situation becomes even more complex if network flows are considered, which could lead to cascading failures, such as the 2003 Northeast electricity blackout, affecting millions of consumers (Watts, 2002). But robustness studies play a key role in designing new drugs or in developing systems and network topologies that are highly error resistant.
Another class of much-studied problems focuses on search in networks, leading to algorithms and methods to efficiently locate information in complex networked structures. These studies play a key role in a wide range of problems, from Web search algorithms to the identification of information and expertise in an organization.
REFERENCES
Barabási, A.L., and R. Albert. 1999. Emergence of scaling in random networks. Science 286(5439): 509–512.
Demers, A., D. Greene, C. Hauser, W. Irish, J. Larson, S. Shenker, H. Sturgis, D. Swinehart, and D. Terry. 1987. Epidemic algorithms for replicated database maintenance . In Proceedings of the 6th ACM Symposium on Principles of Distributed Computing. Vancouver, B.C., Canada.
Dodds, P.S., D.J. Watts, and C.F. Sabel. 2003. Information exchange and the robustness of organizational networks. Proceedings of the National Academy of Sciences of the United States of America 100(21): 12516–12521.
Watts, D.J. 2002. A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences of the United States of America 99: 12516–12521.
Watts, D.J., and S. Strogatz. 1998. Collective dynamics of “small-world” networks. Nature 393(6684): 440–442.
Representative terms from entire chapter:
network science