Dense Subgraph Discovery |

Numerous high-impact applications rely on dense subgraph discovery, including anomaly detection and security, community detection in social networks, and the Web graph, detection of protein complexes in protein interaction networks, and extraction of highly correlated entities from a relevance graph. The latter type of graphs is used to encode pairwise similarities among entities, e.g., time series, and is ubiquitous in data science applications. Two major formulations for dense subgraph discovery are the maximum clique problem and the densest subgraph problem (DSP). The former is a notoriously hard problem, whereas the latter is poly-time solvable and lies at the core of large-scale data mining.

Denser than the Densest Subgraph: A heuristic approach for large-near clique extraction

In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs.

Code

Publications

Denser than the Densest Subgraph: Extracting Optimal Quasi-Cliques with Quality Guarantees

The $k$ -clique densest subgraph problem: a principled approach for large-near clique extraction

$k$ -clique densest subgraph on a collaboration network

My work introduced the $k$ -clique densest subgraph problem rigorous theoretical guarantees for scalable extraction of large near-cliques from networks. The original intuition behind designing this family of objectives, which contain DSP as the special case $k = 2$ , is that triangles are a better signature for community participation compared to edges. The figure on the right, shows what we observe on an astrophysics collaboration network, and is representative of what we observe on real-world networks. As $k$ grows, the size of the optimal sets $S^*_k$ drops and the edge density $f_e(S^*_k)= e(S^*_k)/ {|S^*_k| \choose 2}$ grows. Notice the sudden change from $k=2$ to $k=3$ and that for $k=5$ we are able to find a large near-clique on 62 nodes. The dots are scaled according to $f_e$ . The same behavior is observed across social networks, autonomous systems, blog and Web graphs. Subsequent work focuses on scaling to large-scale datasets, while maintaining

Code

Publications

Densest Subgraph Problem in the Streaming Model

Bahmani et al. first provided a $2+\epsilon$ approximation algorithm for the DSP ( $\epsilon>0$ ) that performs $O(\log n/\epsilon)$ passes over the input in the streaming model. It was an open question whether that number of passes was the best possible. In joint work with Sayan Bhattacharya, Monika Henzinger, and Danupon Nanongkai we showed that a single pass suffices to get a $2+\epsilon$ approximation. In follow-up paper, my collaborators and I improved the approximation guarantee to $1+\epsilon$ . Specifically, we showed that sampling $O(n\log n /epsilon^2)$ edges uniformly at random suffices to get a $1+\epsilon$ approximation guarantee for the DSP.

Publications

Densest Subgraph Problem for Dynamic Graphs

In our STOC 2015 paper, we provide state-of-the-art results for the DSP on time-evolving graphs. For more details, see here.