General Information
Instructor: Charalampos E. Tsourakakis
Classroom: EPC 204 (due to COVID19, we will be using Zoom after the Spring break)
Time: Mondays, 2:30 pm5:15 pm
Office hours: Fridays 23pm (upon appointment)
Email: tsourolampis at gmail dot com.
Course description
How can we develop a better understanding of realworld networks? This course aims to introduce students to important topics in largescale graph mining ranging from graph models to scalable algorithm design, and learning applications. Topics include random graph theory, algorithm design for key graph algorithmic problems, graphbased learning algorithms and applications.
Textbooks
The class does not a textbook. However, the following books that are also available in a draft form online will be useful during this course.
 Introduction to random graphs by Frieze and Karonski [FK]
 Networks, Crowds, and Markets by Easley and Kleinberg [EK]
 Spectral and algebraic graph theory by Spielman [S]
 Graph theory by Diestel [D]
Piazza
Grading
 Class Project 50%
 Two homeworks 25% (12.5%+12.5%)

Homework policy: You may collaborate on each homework with at most one other person. The name of your collaborator must be written down. You must write your solutions on your own. You may change collaborator from homework to homework. You may consult outside materials, but you must cite your sources in your solutions. You may not search the Web for solutions.
Violation of the policy may result in receiving a zero grade for the question or the assignment or even the whole course. The penalty will be assessed at the instructor’s discretion.
 Advice: Start early!

 Midterm Exam 15%
 Class scribe 10%
Problem sets
Schedule
Module 1: Random graphs
Lecture: Introduction. Realworld networks and their properties. Random graph models. [Slides]
Lecture: Probabilistic tools. First and second moment method. Fundamental properties of [Scribe]
Lecture: Chernoff bounds. SudakovKrivelevich proof for the emergence of the giant component. Power law degree distributions. [Scribe]
Lecture: Smallworld phenomenon. [Slides][Handwritten notes]
Readings
 Chapter 1 from [FK]
 Chapter 1 and Chapter 2 from [EK]
 What is a power law distribution, Wikipedia article
 Clustering coefficient, Wikipedia article
 Chapter 18 from [EK]
 The phase transition in random graphs – a simple proof, by Krivelevich, Sudakov
 The smallworld phenomenon: an algorithmic perspective, by Kleinberg
Optional
 A brief history of generative models for power law and lognormal distributions by Mitzenmacher
 The degree sequence of a scale‐free random graph process, by Bollobás, Riordan, Spencer, Tusnády
Module 2: Spaceefficient graph mining
Lecture: FlajoletMartin sketches. ANF algorithm for diameter estimation on massive graphs. [Slides]
Lecture: Concentration results for subgaussian and subexponential random variables. Application: JohnsonLindenstrauss lemma. AMS sketch. Counting triangles in graph streams using moment estimation [Scribe]
Readings
 Probabilistic counting algorithms for data base applications, by Flajolet, Martin
 ANF: A fast and scalable tool for data mining in massive graphs, by Palmer, Gibbons, Faloutsos
 An elementary proof of a theorem of Johnson and Lindenstrauss, by Dasgupta, Gupta
 Optimality of the JohnsonLindenstrauss Lemma, by Green Larsen, Nelson
 The Space Complexity of Approximating the Frequency Moments, by Alon, Matias, Szegedy
 Reductions in streaming algorithms, with an application to counting triangles in graphs by BarYossef, Kumar, Sivakumar
 Analyzing Graph Structure via Linear Measurements, by Ahn, Guha, McGregor
Optional Readings
 Hyperloglog, Wikipedia article
 Original Hyperloglog paper, by Flajolet, Fusy, Gandouet, Meunier
 Facebook article on Hyperloglog
 Four degrees of separation by Backstrom, Boldi, Rosa, Ugander, Vigna
 HADI paper, by Kang, Tsourakakis, Appel, Faloutsos, Leskovec
 A Hybrid Sampling Scheme for Triangle Counting by Kallaugher, Price
Module 3: Graph sparsification
Lecture 7:
– Triangle sparsifiers [Handwritten notes]
– Densest subgraph sparsifiers. [Slides]
– Graph spanners [Handwritten notes, based on Stefano Leucci’s notes]
– Cut sparsifiers [Handwritten notes and Nick Harvey’s notes]
– Spectral sparsifiers [Handwritten notes]
Readings
 Arboricity and subgraph listing, by Chiba, Nishizeki
 Triangle sparsifiers, by Tsourakakis, Kolountzakis, Miller
 Colorful Triangle Counting and a MapReduce Implementation by Pagh, Tsourakakis
 Scalable Large NearClique Detection in LargeScale Networks via Sampling by Mitzenmacher, Pachocki, Peng, Tsourakakis, Xu
 Densest Subgraph in Dynamic Graph Streams by McGregor, Tench, Vorotnikova, Vu
 Applications of Uniform Sampling: Densest Subgraph and Beyond by Esfandiari, Hajiaghayi, Woodruff
 Notes on Spanners by Stefano Leucci
 Nick Harvey’s notes on cut sparsifiers
 Graph Sparsification by EdgeConnectivity and Random Spanning Trees by Wai Shing Fung, Nicholas J. A. Harvey
 Graph sparsification by effective resistances by Spielman, Srivastava
Optional Readings
 Randomized Approximation Schemes for Cuts and Flows in Capacitated Graphs by BenczurKarger (uses “edge strengths” for sampling, instead of edge connectivity for sampling as we saw in class)
 Metric Sublinear Algorithms via Linear Sampling by Esfandiari, Mitzenmacher
 Spectral sparsification of graphs by Spielman, Teng
 Constantarboricity sparsifiers by Chu, Cohen, Pachocki, Peng
 Interlacing Families I: Bipartite Ramanujan Graphs of All Degrees by Marcus, Spielman, Srivastava
 Interlacing Families II: Mixed Characteristic Polynomials and the KadisonSinger Problem by Marcus, Spielman, Srivastava
Module 4: Graphs and Dense subgraph discovery
Lecture: Measures of density. Densest subgraph problem (DSP) and the kclique DSP, dense subgraph discovery formulations. Efficient algorithms for static graphs. Efficient algorithms for dynamic graphs. [Slides]
Videos (KDD 2015 tutorial, by GionisTsourakakis)
Readings
 Finding a maximum density subgraph by Goldberg
 Greedy Approximation Algorithms for Finding Dense Components in a Graph by Charikar
 Flowless: Extracting Densest Subgraphs Without Flow Computations by Boob, Gao, Peng, Sawlani, Tsourakakis, Wang, Wang
 The kclique densest subgraph problem by Tsourakakis
 On Finding Dense Subgraphs by Khuller, Saha
Optional Readings
 Denser than the densest subgraph: extracting optimal quasicliques with quality guarantees by Tsourakakis, Bonchi, Gionis, Gullo, Tsiarli
 Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphsby Sariyüce, Seshadhri, Pinar, Çatalyürek
 Efficient Algorithms for Densest Subgraph Discovery by Fang, Yu, Cheng, Lakshmanan, Li
Module 5: Mining Uncertain Graphs with Risk Aversion
Lecture: Model of uncertain graphs. Riskaverse dense subgraph discovery. Riskaverse graph matchings. [Slides]
Readings
 RiskAverse Matchings over Uncertain Graph Databases by Tsourakakis, Sekar, Lam, Yang
 Novel Dense Subgraph Discovery Primitives: Risk Aversion and Exclusion Queriesby Tsourakakis, Chen, Kakimura, Pachocki
Optional Readings
 Clustering large probabilistic graphs by Kollios, Potamias, Terzi
 Uncertain graph sparsification by Parchas, Papailiou, Papadias, Bonchi
Module 6: Graphs and Centralities
(guest lecture by Prof. Riondato)
Lecture: Introduction to centrality measures; The axioms of centrality; Degree centrality and prestige. Definitions, exact and approximation algorithms for: (i) Closeness centrality, and (ii) betweenness centrality. [Slides]
Readings
 Axioms for Centrality by Boldi,Vigna
 A faster algorithm for betweenness centrality by Brandes
 Fast approximation of betweenness centrality through sampling by Riondato, Kornaropoulos
 ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages by Riondato and Upfal

Computing classic closeness centrality, at scale by Cohen, Delling, Pajor, Werneck
Optional Readings
 Approximating betweenness centrality by Bader, Madder, Kintali, Mihail
 Almost lineartime algorithms for adaptive betweenness centrality using hypergraph sketches by Yoshida
 Scalable Betweenness Centrality Maximization via Sampling by Mahmoody, Tsourakakis, Upfal
 Spanning edge centrality: largescale applications and applications by Mavroforakis, GarciaLebron, Koutis, Terzi
Module 7: Graphs and Communities
Lecture: Eigenvalues of graphs. Cheeger’s inequality and spectral clustering. Motifbased graph clustering.
Readings
 Eigenvalues of the Laplacian and their relationship to the connectedness of the graph, nice collection of introductory notes by Marsden
2. Chapters 15, from Luca Trevisan’s notes
3. Class’ Jupyter notebook
4. Normalized Cuts and Image Segmentation by Shi and Malik
5. Dan Spielman’s tutorial on Spectral graph theory
Resources
Data Sources
Stanford Large Network Dataset Collection (SNAP)
The University of Florida Sparse Matrix Collection (UFS)
Konect
Icon Colorado
Software (recommended libraries)
NetworkX (Python)
JUNG (Java)
Stanford Large Network Analysis (SNAP system) (C++)
MATLAB BGL (Matlab)
Mathematica, Graph and Network Modeling (Mathematica)
Spark Graphx
Apache Giraph
Neo4j