CS591 Large-Scale Graph Mining – Spring 2020

General Information

Instructor: Charalampos E. Tsourakakis
Classroom: EPC 204 (due to COVID-19, we will be using Zoom after the Spring break)
Time: Mondays, 2:30 pm-5:15 pm
Office hours: Fridays 2-3pm (upon appointment)
Email: tsourolampis at gmail dot com.


Course description

How can we develop a better understanding of real-world networks? This course aims to introduce students to important topics in large-scale graph mining ranging from graph models to scalable algorithm design, and learning applications.  Topics include random graph theory, algorithm design for key graph algorithmic problems, graph-based learning algorithms and applications.


The class does not a textbook. However, the following books that are also available in a draft form online will be useful during this course.



  • Class Project 50%
  • Two homeworks 25% (12.5%+12.5%)
    • Homework policy: You may collaborate on each homework with at most one other person. The name of your collaborator must be written down. You must write your solutions on your own. You may change collaborator from homework to homework.  You may consult outside materials, but you must cite your sources in your solutions. You may not search the Web for solutions.

      Violation of the policy may result in receiving a zero grade for the question or the assignment  or even the whole course. The penalty will be assessed at the instructor’s discretion.

    • Advice: Start early!
  • Midterm Exam 15%
  • Class scribe 10%

Problem sets

  1. HW1.pdf, due by March 16th
  2. HW2.pdf, due by April 13th


Module 1: Random graphs

Lecture: Introduction. Real-world networks and their properties. Random graph models. [Slides]

Lecture: Probabilistic tools. First and second moment method. Fundamental properties of G(n,p) [Scribe]

Lecture: Chernoff bounds. Sudakov-Krivelevich proof for the emergence of the giant component. Power law degree distributions. [Scribe]

Lecture: Small-world phenomenon. [Slides][Handwritten notes]



  1. A brief history of generative models for power law and lognormal distributions  by Mitzenmacher
  2. The degree sequence of a scale‐free random graph process, by Bollobás,  Riordan, Spencer, Tusnády


Module 2: Space-efficient graph mining

LectureFlajolet-Martin sketches. ANF algorithm for diameter estimation on massive graphs. [Slides]

Lecture: Concentration results for sub-gaussian and sub-exponential random variables. Application: Johnson-Lindenstrauss lemma. AMS sketch. Counting triangles in graph streams using moment estimation [Scribe]


Optional Readings

Module 3: Graph sparsification

Lecture 7:
– Triangle sparsifiers [Handwritten notes]
– Densest subgraph sparsifiers. [Slides]
– Graph spanners [Handwritten notes, based on Stefano Leucci’s notes]
– Cut sparsifiers [Handwritten notes and Nick Harvey’s notes]
– Spectral sparsifiers [Handwritten notes]


  1. Arboricity and subgraph listing, by Chiba, Nishizeki
  2. Triangle sparsifiers, by Tsourakakis, Kolountzakis, Miller
  3. Colorful Triangle Counting and a MapReduce Implementation by Pagh, Tsourakakis
  4. Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling by Mitzenmacher, Pachocki, Peng, Tsourakakis, Xu
  5. Densest Subgraph in Dynamic Graph Streams by McGregor, Tench, Vorotnikova, Vu
  6. Applications of Uniform Sampling: Densest Subgraph and Beyond by Esfandiari, Hajiaghayi,  Woodruff
  7. Notes on Spanners by Stefano Leucci
  8. Nick Harvey’s notes on cut sparsifiers
  9. Graph Sparsification by Edge-Connectivity and Random Spanning Trees by Wai Shing Fung, Nicholas J. A. Harvey
  10. Graph sparsification by effective resistances by Spielman, Srivastava

Optional Readings

  1. Randomized Approximation Schemes for Cuts and Flows in Capacitated Graphs by Benczur-Karger (uses “edge strengths” for sampling, instead of edge connectivity for sampling as we saw in class)
  2. Metric Sublinear Algorithms via Linear Sampling by Esfandiari, Mitzenmacher
  3. Spectral sparsification of graphs by Spielman, Teng
  4. Constant-arboricity sparsifiers by Chu, Cohen, Pachocki, Peng
  5. Interlacing Families I: Bipartite Ramanujan Graphs of All Degrees by Marcus, Spielman, Srivastava
  6. Interlacing Families II: Mixed Characteristic Polynomials and the Kadison-Singer Problem by Marcus, Spielman, Srivastava

Module 4: Graphs and Dense subgraph discovery

Lecture: Measures of density.  Densest subgraph problem (DSP) and the k-clique DSP, dense subgraph discovery formulations. Efficient algorithms for static graphs. Efficient algorithms for dynamic graphs. [Slides]


Videos (KDD 2015 tutorial, by Gionis-Tsourakakis)


  1. Finding a maximum density subgraph by Goldberg
  2. Greedy Approximation Algorithms for Finding Dense Components in a Graph by Charikar
  3. Flowless: Extracting Densest Subgraphs Without Flow Computations by Boob, Gao, Peng, Sawlani, Tsourakakis, Wang, Wang
  4. The k-clique densest subgraph problem by Tsourakakis
  5. On Finding Dense Subgraphs by Khuller, Saha

Optional Readings

  1. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees by Tsourakakis, Bonchi, Gionis, Gullo, Tsiarli
  2. Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphsby Sariyüce, Seshadhri, Pinar, Çatalyürek
  3. Efficient Algorithms for Densest Subgraph Discovery by Fang, Yu, Cheng, Lakshmanan, Li


Module 5: Mining Uncertain Graphs with Risk Aversion 

Lecture: Model of uncertain graphs.  Risk-averse dense subgraph discovery. Risk-averse graph matchings. [Slides]


  1. Risk-Averse Matchings over Uncertain Graph Databases by Tsourakakis, Sekar, Lam, Yang
  2. Novel Dense Subgraph Discovery Primitives: Risk Aversion and Exclusion Queriesby Tsourakakis, Chen, Kakimura, Pachocki

Optional Readings

  1. Clustering large probabilistic graphs by Kollios, Potamias, Terzi
  2. Uncertain graph sparsification by Parchas, Papailiou, Papadias, Bonchi

Module 6: Graphs and Centralities
(guest lecture by Prof. Riondato)

Lecture: Introduction to centrality measures; The axioms of centrality; Degree centrality and prestige. Definitions, exact and approximation algorithms for: (i) Closeness centrality, and (ii) betweenness centrality. [Slides]


  1. Axioms for Centrality by Boldi,Vigna
  2. A faster algorithm for betweenness centrality by Brandes
  3. Fast approximation of betweenness centrality through sampling by  Riondato, Kornaropoulos
  4. ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages by Riondato and Upfal
  5. Computing classic closeness centrality, at scale  by Cohen, Delling, Pajor, Werneck

Optional Readings

  1. Approximating betweenness centrality by Bader, Madder, Kintali, Mihail
  2. Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches by Yoshida
  3. Scalable Betweenness Centrality Maximization via Sampling by Mahmoody, Tsourakakis, Upfal
  4. Spanning edge centrality: large-scale applications and applications by Mavroforakis, Garcia-Lebron, Koutis, Terzi

Module 7: Graphs and Communities

Lecture: Eigenvalues of graphs. Cheeger’s inequality and spectral clustering. Motif-based graph clustering.


  1. Eigenvalues of the Laplacian and their relationship to the connectedness of the graph, nice collection of introductory notes by Marsden

Click to access Marsden.pdf


2.  Chapters 1-5, from Luca Trevisan’s notes
3. Class’ Jupyter notebook
4. Normalized Cuts and Image Segmentation by Shi and Malik
5. Dan Spielman’s tutorial on Spectral graph theory

Click to access SpectTut.pdf


Data Sources

Stanford Large Network Dataset Collection (SNAP)
The University of Florida Sparse Matrix Collection (UFS)
Icon Colorado

Software (recommended libraries)

NetworkX (Python)
JUNG (Java)
Stanford Large Network Analysis (SNAP system) (C++)
Mathematica, Graph and Network Modeling (Mathematica)
Spark Graphx 
Apache Giraph

Visualization software


%d bloggers like this: