General Information
Instructor: Charalampos E. Tsourakakis
Classroom: EPC 204 (due to COVID-19, we will be using Zoom after the Spring break)
Time: Mondays, 2:30 pm-5:15 pm
Office hours: Fridays 2-3pm (upon appointment)
Email: tsourolampis at gmail dot com.
Course description
How can we develop a better understanding of real-world networks? This course aims to introduce students to important topics in large-scale graph mining ranging from graph models to scalable algorithm design, and learning applications. Topics include random graph theory, algorithm design for key graph algorithmic problems, graph-based learning algorithms and applications.
Textbooks
The class does not a textbook. However, the following books that are also available in a draft form online will be useful during this course.
- Introduction to random graphs by Frieze and Karonski [FK]
- Networks, Crowds, and Markets by Easley and Kleinberg [EK]
- Spectral and algebraic graph theory by Spielman [S]
- Graph theory by Diestel [D]
Piazza
Grading
- Class Project 50%
- Two homeworks 25% (12.5%+12.5%)
-
Homework policy: You may collaborate on each homework with at most one other person. The name of your collaborator must be written down. You must write your solutions on your own. You may change collaborator from homework to homework. You may consult outside materials, but you must cite your sources in your solutions. You may not search the Web for solutions.
Violation of the policy may result in receiving a zero grade for the question or the assignment or even the whole course. The penalty will be assessed at the instructor’s discretion.
- Advice: Start early!
-
- Midterm Exam 15%
- Class scribe 10%
Problem sets
Schedule
Module 1: Random graphs
Lecture: Introduction. Real-world networks and their properties. Random graph models. [Slides]
Lecture: Probabilistic tools. First and second moment method. Fundamental properties of [Scribe]
Lecture: Chernoff bounds. Sudakov-Krivelevich proof for the emergence of the giant component. Power law degree distributions. [Scribe]
Lecture: Small-world phenomenon. [Slides][Handwritten notes]
Readings
- Chapter 1 from [FK]
- Chapter 1 and Chapter 2 from [EK]
- What is a power law distribution, Wikipedia article
- Clustering coefficient, Wikipedia article
- Chapter 18 from [EK]
- The phase transition in random graphs – a simple proof, by Krivelevich, Sudakov
- The small-world phenomenon: an algorithmic perspective, by Kleinberg
Optional
- A brief history of generative models for power law and lognormal distributions by Mitzenmacher
- The degree sequence of a scale‐free random graph process, by Bollobás, Riordan, Spencer, Tusnády
Module 2: Space-efficient graph mining
Lecture: Flajolet-Martin sketches. ANF algorithm for diameter estimation on massive graphs. [Slides]
Lecture: Concentration results for sub-gaussian and sub-exponential random variables. Application: Johnson-Lindenstrauss lemma. AMS sketch. Counting triangles in graph streams using moment estimation [Scribe]
Readings
- Probabilistic counting algorithms for data base applications, by Flajolet, Martin
- ANF: A fast and scalable tool for data mining in massive graphs, by Palmer, Gibbons, Faloutsos
- An elementary proof of a theorem of Johnson and Lindenstrauss, by Dasgupta, Gupta
- Optimality of the Johnson-Lindenstrauss Lemma, by Green Larsen, Nelson
- The Space Complexity of Approximating the Frequency Moments, by Alon, Matias, Szegedy
- Reductions in streaming algorithms, with an application to counting triangles in graphs by Bar-Yossef, Kumar, Sivakumar
- Analyzing Graph Structure via Linear Measurements, by Ahn, Guha, McGregor
Optional Readings
- Hyperloglog, Wikipedia article
- Original Hyperloglog paper, by Flajolet, Fusy, Gandouet, Meunier
- Facebook article on Hyperloglog
- Four degrees of separation by Backstrom, Boldi, Rosa, Ugander, Vigna
- HADI paper, by Kang, Tsourakakis, Appel, Faloutsos, Leskovec
- A Hybrid Sampling Scheme for Triangle Counting by Kallaugher, Price
Module 3: Graph sparsification
Lecture 7:
– Triangle sparsifiers [Handwritten notes]
– Densest subgraph sparsifiers. [Slides]
– Graph spanners [Handwritten notes, based on Stefano Leucci’s notes]
– Cut sparsifiers [Handwritten notes and Nick Harvey’s notes]
– Spectral sparsifiers [Handwritten notes]
Readings
- Arboricity and subgraph listing, by Chiba, Nishizeki
- Triangle sparsifiers, by Tsourakakis, Kolountzakis, Miller
- Colorful Triangle Counting and a MapReduce Implementation by Pagh, Tsourakakis
- Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling by Mitzenmacher, Pachocki, Peng, Tsourakakis, Xu
- Densest Subgraph in Dynamic Graph Streams by McGregor, Tench, Vorotnikova, Vu
- Applications of Uniform Sampling: Densest Subgraph and Beyond by Esfandiari, Hajiaghayi, Woodruff
- Notes on Spanners by Stefano Leucci
- Nick Harvey’s notes on cut sparsifiers
- Graph Sparsification by Edge-Connectivity and Random Spanning Trees by Wai Shing Fung, Nicholas J. A. Harvey
- Graph sparsification by effective resistances by Spielman, Srivastava
Optional Readings
- Randomized Approximation Schemes for Cuts and Flows in Capacitated Graphs by Benczur-Karger (uses “edge strengths” for sampling, instead of edge connectivity for sampling as we saw in class)
- Metric Sublinear Algorithms via Linear Sampling by Esfandiari, Mitzenmacher
- Spectral sparsification of graphs by Spielman, Teng
- Constant-arboricity sparsifiers by Chu, Cohen, Pachocki, Peng
- Interlacing Families I: Bipartite Ramanujan Graphs of All Degrees by Marcus, Spielman, Srivastava
- Interlacing Families II: Mixed Characteristic Polynomials and the Kadison-Singer Problem by Marcus, Spielman, Srivastava
Module 4: Graphs and Dense subgraph discovery
Lecture: Measures of density. Densest subgraph problem (DSP) and the k-clique DSP, dense subgraph discovery formulations. Efficient algorithms for static graphs. Efficient algorithms for dynamic graphs. [Slides]
Videos (KDD 2015 tutorial, by Gionis-Tsourakakis)
Readings
- Finding a maximum density subgraph by Goldberg
- Greedy Approximation Algorithms for Finding Dense Components in a Graph by Charikar
- Flowless: Extracting Densest Subgraphs Without Flow Computations by Boob, Gao, Peng, Sawlani, Tsourakakis, Wang, Wang
- The k-clique densest subgraph problem by Tsourakakis
- On Finding Dense Subgraphs by Khuller, Saha
Optional Readings
- Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees by Tsourakakis, Bonchi, Gionis, Gullo, Tsiarli
- Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphsby Sariyüce, Seshadhri, Pinar, Çatalyürek
- Efficient Algorithms for Densest Subgraph Discovery by Fang, Yu, Cheng, Lakshmanan, Li
Module 5: Mining Uncertain Graphs with Risk Aversion
Lecture: Model of uncertain graphs. Risk-averse dense subgraph discovery. Risk-averse graph matchings. [Slides]
Readings
- Risk-Averse Matchings over Uncertain Graph Databases by Tsourakakis, Sekar, Lam, Yang
- Novel Dense Subgraph Discovery Primitives: Risk Aversion and Exclusion Queriesby Tsourakakis, Chen, Kakimura, Pachocki
Optional Readings
- Clustering large probabilistic graphs by Kollios, Potamias, Terzi
- Uncertain graph sparsification by Parchas, Papailiou, Papadias, Bonchi
Module 6: Graphs and Centralities
(guest lecture by Prof. Riondato)
Lecture: Introduction to centrality measures; The axioms of centrality; Degree centrality and prestige. Definitions, exact and approximation algorithms for: (i) Closeness centrality, and (ii) betweenness centrality. [Slides]
Readings
- Axioms for Centrality by Boldi,Vigna
- A faster algorithm for betweenness centrality by Brandes
- Fast approximation of betweenness centrality through sampling by Riondato, Kornaropoulos
- ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages by Riondato and Upfal
-
Computing classic closeness centrality, at scale by Cohen, Delling, Pajor, Werneck
Optional Readings
- Approximating betweenness centrality by Bader, Madder, Kintali, Mihail
- Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches by Yoshida
- Scalable Betweenness Centrality Maximization via Sampling by Mahmoody, Tsourakakis, Upfal
- Spanning edge centrality: large-scale applications and applications by Mavroforakis, Garcia-Lebron, Koutis, Terzi
Module 7: Graphs and Communities
Lecture: Eigenvalues of graphs. Cheeger’s inequality and spectral clustering. Motif-based graph clustering.
Readings
- Eigenvalues of the Laplacian and their relationship to the connectedness of the graph, nice collection of introductory notes by Marsden
2. Chapters 1-5, from Luca Trevisan’s notes
3. Class’ Jupyter notebook
4. Normalized Cuts and Image Segmentation by Shi and Malik
5. Dan Spielman’s tutorial on Spectral graph theory
Resources
Data Sources
Stanford Large Network Dataset Collection (SNAP)
The University of Florida Sparse Matrix Collection (UFS)
Konect
Icon Colorado
Software (recommended libraries)
NetworkX (Python)
JUNG (Java)
Stanford Large Network Analysis (SNAP system) (C++)
MATLAB BGL (Matlab)
Mathematica, Graph and Network Modeling (Mathematica)
Spark Graphx
Apache Giraph
Neo4j