The National Academies Press

Currently Skimming:

Data Mining on Large Graphs
Pages 265-286

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 265... ... , ANF's highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-dowr~ tasks and to repeatedly answer questions about changing data sources. Read the entire page →
From page 266... ... There ale many more examples of graphs which contain interesting information for data mining purposes. For example, the telephone calling records Tom a long distance carrier can be viewed ~ a graph, and by mining the graph we may help identify Fraudulent behaviour or marketing opportunities. Read the entire page →
From page 267... ... Instead, we produce a data mining tool that is useful given any amount of RAM. These two goals give rise to the following list of properties that our tool must satisfy when analyzing a graph with n nodes and m edges: Error guarantees: estimates must be accurate at all distances (not just in the limit) Read the entire page →
From page 268... ... .:/ . ,0 ~ ~ ,~ Figure 1: ANF algorithms scale but not the others Sequential scans of the edge file: avoid random accesses to the graph. Read the entire page →
From page 269... ... : u'v ~ V) Diameter of the graph Starting set for the neighbourhood Concluding set for the neighbourhood Number of extra approximation bits Number of parallel approximations Def. Read the entire page →
From page 270... ... Given that we are interested only in very large graphs and graphs with directed edges, this does not solve the problems related to large edge files. Instead, we will need to find a new computation strategy which avoids random access to disk. Read the entire page →
From page 271... ... 3.1 Basic ANF Algorithm (ANF-O) A graph traversal effectively accesses the edge file in random order. Read the entire page →
From page 272... ... Example. Figure 3 shows the bitmasks and approximations for a simple example of our most basic ANF algorithm. Read the entire page →
From page 274... ... In both cases, we can further improve ANF by reducing the number of bits of data that must be stored. First, observe that, as ANF runs, most of the 274 DYNAMIC SOCIAL NETWORK MODELING AND ANALYSIS Read the entire page →
From page 275... ... /.77351 est += IN+~, h, C`) where b is the average position of the least zero bits in the k bitmasks output N+ (h, S Read the entire page →
From page 276... ... approximation scheme �~ sat� �_ The R] approbation algorithm is based on the approximate counting scheme proposed in [4] Read the entire page →
From page 277... ... We use three real data sets: Router: Undirected Internet routers data from ISI [19] , including scans done by Lucent Bell Laboratories t124. Read the entire page →
From page 278... ... To measure the sensitivity we averaged the RMS error over 10 trials for different values of r and the different data sets. These results appear 278 DYNAMIC SOCIALNT:TWORKMOD~LINGANDx4NALYSIS Read the entire page →
From page 279... ... We then increase s and t proportionately to generate larger DYNAMIC SOCIAL NETWORK MODELING ED ^^YSIS 279 Read the entire page →
From page 280... ... . � � ~ Cornell cycle grid urutorm 80-20 core router Data Sot 20 _ 15 . Read the entire page →
From page 281... ... ANF Speed-up Uniform 92 0.34 270x Cora 6 1.4 4x 8~20 680 0.9 756x Router 1,200 1.7 70ax 4.2.4 Speed and Scalability Table 3 reports wall-clock running times on an otherwise unloaded Pentium II-450 machine for both the exact computation (Breadth-First search) and ANF with k = 64 parallel approximations. Read the entire page →
From page 282... ... Delta importance Figure 9: ANF finds the best starting move for ~ 5 Data mining with our ANF Too! With our highly-accurate and efficient approximation tool, ANF, it is now possible to answer some of the prototypical graph mining questions that we posed in the introduction. Read the entire page →
From page 283... ... We then cluster these graphs by computing the hop exponents and forming clusters that have similar hop exponents (less than 0.1 difference) Read the entire page →
From page 284... ... This type of study was infeasible before our ANF tool as an exact computation would have required over a year of computation time. 6 Conclusions In this paper we presented 10 interesting data mining questions on graph data, proposed an efficient and accurate approximation algorithm that gives us the tool, ANF, we needed to answer these questions, and presented results for three of these questions on real-world data. Read the entire page →
From page 285... ... points. Can be parallelized: Our ANF algorithm may be parallelized with very few synchronization Employs sequential scans: Unlike prior approximations of the neighbourhood function, our algorithm avoids random access of the edge file and performs one sequential scan of the edge file per hop. Read the entire page →
From page 286... ... An apriori-based algorithm for mining frequent substructures from graph data. In PDKK, pages 13-23, 2000. Read the entire page →

From page 265...

... , ANF's highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-dowr~ tasks and to repeatedly answer questions about changing data sources.

Data Mining on Large Graphs Pages 265-286

Data Mining on Large Graphs
Pages 265-286