CS591 Large-Scale Graph Mining – Spring 2020

General Information

Instructor: Charalampos E. Tsourakakis
Classroom: EPC 204
Time: Mondays, 2:30 pm-5:15 pm
Office hours: Fridays 2-3pm (upon appointment)
Email: tsourolampis at gmail dot com.


Course description

How can we develop a better understanding of real-world networks? This course aims to introduce students to important topics in large-scale graph mining ranging from graph models to scalable algorithm design, and learning applications.  Topics include random graph theory, algorithm design for key graph algorithmic problems, graph-based learning algorithms and applications.


The class does not a textbook. However, the following books that are also available in a draft form online will be useful during this course.



  • Class Project 50%
  • Two homeworks 25% (12.5%+12.5%)
    • Homework policy: You may collaborate on each homework with at most one other person. The name of your collaborator must be written down. You must write your solutions on your own. You may change collaborator from homework to homework.  You may consult outside materials, but you must cite your sources in your solutions. You may not search the Web for solutions.

      Violation of the policy may result in receiving a zero grade for the question or the assignment  or even the whole course. The penalty will be assessed at the instructor’s discretion.

    • Advice: Start early!
  • Midterm Exam 15%
  • Class scribe 10%

Problem sets

  1. HW1.pdf , due by March 16th
  2. HW2.pdf, due by April 13th


Module 1: Random graphs

Lecture 1 (Jan 27th, 2020): Introduction. Real-world networks and their properties. Random graph models.

Lecture 1, Slides

Lecture 2 (Feb 2nd, 2020): Probabilistic tools. First and second moment method. Fundamental properties of G(n,p).

Lecture 2, Scribe

Lecture 3 (Feb. 18, 2020): Chernoff bounds. Sudakov-Krivelevich proof for the emergence of the giant component. Power law degree distributions.

Lecture 3, Scribe

Lecture 4 (Feb. 24, 2020): Small-world phenomenon. 

Lecture 4, Scribe
Lecture 4, Slides

Module 2: Space-efficient graph mining

Lecture 5 (Feb. 24, 2020)Flajolet-Martin sketches. ANF algorithm for diameter estimation on massive graphs. 

Lecture 5, Slides


  1. The phase transition in random graphs – a simple proof, by Krivelevich, Sudakov
  2. The degree sequence of a scale‐free random graph process, by Bollobás,  Riordan, Spencer, Tusnády
  3. The small-world phenomenon: an algorithmic perspective, by Kleinberg
  4. An elementary proof of a theorem of Johnson and Lindenstrauss,  by Dasgupta, Gupta
  5. Optimality of the Johnson-Lindenstrauss Lemma, by Green Larsen, Nelson
  6. Analyzing Graph Structure via Linear Measurements, by Ahn, Guha, McGregor
  7. ANF: A fast and scalable tool for data mining in massive graphs,  by Palmer, Gibbons, Faloutsos
%d bloggers like this: