General Information

Instructor: Charalampos E. Tsourakakis
Classroom: ~~EPC 204~~ (due to COVID-19, we will be using Zoom after the Spring break)
Time: Mondays, 2:30 pm-5:15 pm
Office hours: Fridays 2-3pm (upon appointment)
Email: tsourolampis at gmail dot com.

Course description

How can we develop a better understanding of real-world networks? This course aims to introduce students to important topics in large-scale graph mining ranging from graph models to scalable algorithm design, and learning applications. Topics include random graph theory, algorithm design for key graph algorithmic problems, graph-based learning algorithms and applications.

Textbooks

The class does not a textbook. However, the following books that are also available in a draft form online will be useful during this course.

Introduction to random graphs by Frieze and Karonski [FK]
Networks, Crowds, and Markets by Easley and Kleinberg [EK]
Spectral and algebraic graph theory by Spielman [S]
Graph theory by Diestel [D]

Piazza

Grading

Class Project 50%
Two homeworks 25% (12.5%+12.5%)
- Homework policy: You may collaborate on each homework with at most one other person. The name of your collaborator must be written down. You must write your solutions on your own. You may change collaborator from homework to homework. You may consult outside materials, but you must cite your sources in your solutions. You may not search the Web for solutions.
  
  Violation of the policy may result in receiving a zero grade for the question or the assignment or even the whole course. The penalty will be assessed at the instructor’s discretion.
- Advice: Start early!
Midterm Exam 15%
Class scribe 10%

Problem sets

HW1.pdf, due by March 16th
HW2.pdf, due by April 13th

Schedule

Module 1: Random graphs

Lecture: Introduction. Real-world networks and their properties. Random graph models. [Slides]

Lecture: Probabilistic tools. First and second moment method. Fundamental properties of $G(n,p)$ [Scribe]

Lecture: Chernoff bounds. Sudakov-Krivelevich proof for the emergence of the giant component. Power law degree distributions. [Scribe]

Lecture: Small-world phenomenon. [Slides][Handwritten notes]

Readings

Chapter 1 from [FK]
Chapter 1 and Chapter 2 from [EK]
What is a power law distribution, Wikipedia article
Clustering coefficient, Wikipedia article
Chapter 18 from [EK]
The phase transition in random graphs – a simple proof, by Krivelevich, Sudakov
The small-world phenomenon: an algorithmic perspective, by Kleinberg

Optional

A brief history of generative models for power law and lognormal distributions by Mitzenmacher
The degree sequence of a scale‐free random graph process, by Bollobás, Riordan, Spencer, Tusnády

Module 2: Space-efficient graph mining

Lecture: Flajolet-Martin sketches. ANF algorithm for diameter estimation on massive graphs. [Slides]

Lecture: Concentration results for sub-gaussian and sub-exponential random variables. Application: Johnson-Lindenstrauss lemma. AMS sketch. Counting triangles in graph streams using moment estimation [Scribe]

Readings

Probabilistic counting algorithms for data base applications, by Flajolet, Martin
ANF: A fast and scalable tool for data mining in massive graphs, by Palmer, Gibbons, Faloutsos
An elementary proof of a theorem of Johnson and Lindenstrauss, by Dasgupta, Gupta
Optimality of the Johnson-Lindenstrauss Lemma, by Green Larsen, Nelson
The Space Complexity of Approximating the Frequency Moments, by Alon, Matias, Szegedy
Reductions in streaming algorithms, with an application to counting triangles in graphs by Bar-Yossef, Kumar, Sivakumar
Analyzing Graph Structure via Linear Measurements, by Ahn, Guha, McGregor

Optional Readings

Hyperloglog, Wikipedia article
Original Hyperloglog paper, by Flajolet, Fusy, Gandouet, Meunier
Facebook article on Hyperloglog
Four degrees of separation by Backstrom, Boldi, Rosa, Ugander, Vigna
HADI paper, by Kang, Tsourakakis, Appel, Faloutsos, Leskovec
A Hybrid Sampling Scheme for Triangle Counting by Kallaugher, Price

Module 3: Graph sparsification

Lecture 7:
– Triangle sparsifiers [Handwritten notes]
– Densest subgraph sparsifiers. [Slides]
– Graph spanners [Handwritten notes, based on Stefano Leucci’s notes]
– Cut sparsifiers [Handwritten notes and Nick Harvey’s notes]
– Spectral sparsifiers [Handwritten notes]

Readings

Arboricity and subgraph listing, by Chiba, Nishizeki
Triangle sparsifiers, by Tsourakakis, Kolountzakis, Miller
Colorful Triangle Counting and a MapReduce Implementation by Pagh, Tsourakakis
Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling by Mitzenmacher, Pachocki, Peng, Tsourakakis, Xu
Densest Subgraph in Dynamic Graph Streams by McGregor, Tench, Vorotnikova, Vu
Applications of Uniform Sampling: Densest Subgraph and Beyond by Esfandiari, Hajiaghayi, Woodruff
Notes on Spanners by Stefano Leucci
Nick Harvey’s notes on cut sparsifiers
Graph Sparsification by Edge-Connectivity and Random Spanning Trees by Wai Shing Fung, Nicholas J. A. Harvey
Graph sparsification by effective resistances by Spielman, Srivastava

Optional Readings

Randomized Approximation Schemes for Cuts and Flows in Capacitated Graphs by Benczur-Karger (uses “edge strengths” for sampling, instead of edge connectivity for sampling as we saw in class)
Metric Sublinear Algorithms via Linear Sampling by Esfandiari, Mitzenmacher
Spectral sparsification of graphs by Spielman, Teng
Constant-arboricity sparsifiers by Chu, Cohen, Pachocki, Peng
Interlacing Families I: Bipartite Ramanujan Graphs of All Degrees by Marcus, Spielman, Srivastava
Interlacing Families II: Mixed Characteristic Polynomials and the Kadison-Singer Problem by Marcus, Spielman, Srivastava

Module 4: Graphs and Dense subgraph discovery

Lecture: Measures of density. Densest subgraph problem (DSP) and the k-clique DSP, dense subgraph discovery formulations. Efficient algorithms for static graphs. Efficient algorithms for dynamic graphs. [Slides]

Videos (KDD 2015 tutorial, by Gionis-Tsourakakis)

Readings

Finding a maximum density subgraph by Goldberg
Greedy Approximation Algorithms for Finding Dense Components in a Graph by Charikar
Flowless: Extracting Densest Subgraphs Without Flow Computations by Boob, Gao, Peng, Sawlani, Tsourakakis, Wang, Wang
The k-clique densest subgraph problem by Tsourakakis
On Finding Dense Subgraphs by Khuller, Saha

Optional Readings

Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees by Tsourakakis, Bonchi, Gionis, Gullo, Tsiarli
Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphsby Sariyüce, Seshadhri, Pinar, Çatalyürek
Efficient Algorithms for Densest Subgraph Discovery by Fang, Yu, Cheng, Lakshmanan, Li

Module 5: Mining Uncertain Graphs with Risk Aversion

Lecture: Model of uncertain graphs. Risk-averse dense subgraph discovery. Risk-averse graph matchings. [Slides]

Readings

Risk-Averse Matchings over Uncertain Graph Databases by Tsourakakis, Sekar, Lam, Yang
Novel Dense Subgraph Discovery Primitives: Risk Aversion and Exclusion Queriesby Tsourakakis, Chen, Kakimura, Pachocki

Optional Readings

Clustering large probabilistic graphs by Kollios, Potamias, Terzi
Uncertain graph sparsification by Parchas, Papailiou, Papadias, Bonchi

Module 6: Graphs and Centralities
(guest lecture by Prof. Riondato)

Lecture: Introduction to centrality measures; The axioms of centrality; Degree centrality and prestige. Definitions, exact and approximation algorithms for: (i) Closeness centrality, and (ii) betweenness centrality. [Slides]

Readings

Axioms for Centrality by Boldi,Vigna
A faster algorithm for betweenness centrality by Brandes
Fast approximation of betweenness centrality through sampling by Riondato, Kornaropoulos
ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages by Riondato and Upfal
Computing classic closeness centrality, at scale by Cohen, Delling, Pajor, Werneck

Optional Readings

Approximating betweenness centrality by Bader, Madder, Kintali, Mihail
Almost linear-time algorithms for adaptive betweenness centrality using hypergraph sketches by Yoshida
Scalable Betweenness Centrality Maximization via Sampling by Mahmoody, Tsourakakis, Upfal
Spanning edge centrality: large-scale applications and applications by Mavroforakis, Garcia-Lebron, Koutis, Terzi

Module 7: Graphs and Communities

Lecture: Eigenvalues of graphs. Cheeger’s inequality and spectral clustering. Motif-based graph clustering.

Readings

Eigenvalues of the Laplacian and their relationship to the connectedness of the graph, nice collection of introductory notes by Marsden

Click to access Marsden.pdf

2. Chapters 1-5, from Luca Trevisan’s notes
3. Class’ Jupyter notebook
4. Normalized Cuts and Image Segmentation by Shi and Malik
5. Dan Spielman’s tutorial on Spectral graph theory

Click to access SpectTut.pdf

Resources

Data Sources

Stanford Large Network Dataset Collection (SNAP)
The University of Florida Sparse Matrix Collection (UFS)
Konect
Icon Colorado

Software (recommended libraries)

NetworkX (Python)
JUNG (Java)
Stanford Large Network Analysis (SNAP system) (C++)
MATLAB BGL (Matlab)
Mathematica, Graph and Network Modeling (Mathematica)
Spark Graphx
Apache Giraph
Neo4j

Visualization software

GraphViz
Gephi

CS591 Large-Scale Graph Mining – Spring 2020

General Information

Course description

Textbooks

Piazza

Grading

Problem sets

Schedule

Module 1: Random graphs

Readings

Optional

Module 2: Space-efficient graph mining

Readings

Optional Readings

Module 3: Graph sparsification

Readings

Optional Readings

Module 4: Graphs and Dense subgraph discovery

Videos (KDD 2015 tutorial, by Gionis-Tsourakakis)

Readings

Optional Readings

Module 5: Mining Uncertain Graphs with Risk Aversion

Readings

Optional Readings

Module 6: Graphs and Centralities
(guest lecture by Prof. Riondato)

Readings

Optional Readings

Module 7: Graphs and Communities

Readings

Resources

Data Sources

Software (recommended libraries)

Visualization software

Follow webpage via email

CS591 Large-Scale Graph Mining – Spring 2020

General Information

Course description

Textbooks

Piazza

Grading

Problem sets

Schedule

Module 1: Random graphs

Readings

Optional

Module 2: Space-efficient graph mining

Readings

Optional Readings

Module 3: Graph sparsification

Readings

Optional Readings

Module 4: Graphs and Dense subgraph discovery

Videos (KDD 2015 tutorial, by Gionis-Tsourakakis)

Readings

Optional Readings

Module 5: Mining Uncertain Graphs with Risk Aversion

Readings

Optional Readings

Module 6: Graphs and Centralities (guest lecture by Prof. Riondato)

Readings

Optional Readings

Module 7: Graphs and Communities

Readings

Resources

Data Sources

Software (recommended libraries)

Visualization software

Share this:

Follow webpage via email

Module 6: Graphs and Centralities
(guest lecture by Prof. Riondato)