Dense subgraph discovery: Theory and Applications (Tutorial SDM 2021)

Basic information

Finding dense subgraphs is a fundamental graph-theoretic problem, that lies in the heart of numerous graph-mining applications, ranging from finding communities in social networks to detecting regulatory motifs in DNA and anomaly detection. The problem of finding dense subgraphs has been studied extensively in theoretical computer science, and recently, due to the relevance of the problem in real-world applications, it has attracted considerable attention in the data-mining community.

In this tutorial we aim to provide a comprehensive overview of (i) major algorithmic techniques for finding dense subgraphs in large graphs and (ii) graph mining applications that rely on dense subgraph extraction. We will present fundamental concepts and algorithms that date back to 80’s, as well as the latest advances in the area, from theoretical and from practical point-of-view. We will motivate the problem of finding dense subgraphs by discussing how it can be used in real-world applications. We will discuss different density definitions and the complexity of the corresponding optimization problems. We will also present efficient algorithms for different density measures and under different computational models. Specifically, we will focus on scalable streaming, distributed and MapReduce algorithms. Finally we will discuss problem variants, extensions, and will provide pointers for future research directions.

Target audience

The tutorial will be designed to give a broad overview of the main techniques and problem variants, allowing graph-mining researchers to enrich their algorithmic toolbox. The tutorial will include cutting edge research on the topic of dense subgraph discovery, with anomaly detection applications. The intended duration of this tutorial is two hours.

The target audience are researchers and students interested in state-of-the-art graph mining algorithms and applications related to dense subgraph extraction. A computer-science background (B.Sc.\ or equivalent), and familiarity with undergraduate algorithm design and graph theory are prerequisites. The tutorial will focus on intuition and examples, carefully introducing only the minimal necessary theoretical tools, and focusing on applications.

Outline

  1. Introduction
  2. Motivating applications
  3. Measures of density
  4. Densest subgraph problem
    a. Static graphs
    b. Dynamic graphs
  5. Problem variants
  6. Conclusion

Slides

Python code

Run pip install dsd==0.0.1 to install the dense subgraph discovery. Then, one can run some basic algorithmic utilities for the densest subgraph problem as follows. A notebook is available here

 

import networkx as nx
from dsd import *
from datetime import datetime
import matplotlib.pyplot as plt 

karate_graph = nx.karate_club_graph()
karate_edges = [[e[0],e[1]] for e in nx.karate_club_graph().edges()]
nx.draw(karate_graph)

print('exact max flow method')
start = datetime.now()
exact_R = exact_densest(karate_graph)
print('subgraph induced by', exact_R[0])
print('density =', exact_R[1])
print('run time', datetime.now()-start, '\n')

print('flowless method')
start = datetime.now()
flowless_R = flowless(karate_graph, 5)
print('subgraph induced by', flowless_R[0])
print('density =', flowless_R[1])
print('run time', datetime.now()-start, '\n')

print('greedy method')
start = datetime.now()
greedy_R = greedy_charikar(karate_graph)
print('subgraph induced by', greedy_R[0])
print('density =', greedy_R[1])
print('run time', datetime.now()-start, '\n')

Tutors

Prof. Charalampos Tsourakakis
(Boston University,
ISI Foundation)

Charalampos E. Tsourakakis is an assistant professor in the Computer Science department at Boston University, and a senior research scientist at the ISI Foundation. He is also a long-term research associate at Harvard. He obtained his PhD in Algorithms, Combinatorics and Optimization at Carnegie Mellon under the supervision of Alan Frieze, was a postdoctoral fellow at Brown University and Harvard under the supervision of Eli Upfal and Michael Mitzenmacher respectively. He won a best paper award in IEEE Data Mining, has delivered three tutorials in the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, and has designed two graph mining libraries for large-scale graph mining, one of which has been officially included in Windows Azure. His research focuses on graph mining, data mining, and machine learning.

Tianyi Chen (BU)

Tianyi Chen is PhD student at Boston University advised by Prof. Tsourakakis. He got his Master degree in computer science from Boston University in Jan 2019. Prior to that, he earned his bachelor in Software Engineering from Xi’an Jiaotong University, China. His researches mainly focus on dense subgraph discovery, uncertain graphs, and anomaly detection.

References

The key references are included in the slides. For convenience some references follow:

%d bloggers like this: