Numerous high-impact applications rely on dense subgraph discovery, including anomaly detection and security, community detection in social networks, and the Web graph, detection of protein complexes in protein interaction networks, and extraction of highly correlated entities from a relevance graph. The latter type of graphs is used to encode pairwise similarities among entities, e.g., time series, and is ubiquitous in data science applications. Two major formulations for dense subgraph discovery are the maximum clique problem and the densest subgraph problem (DSP). The former is a notoriously hard problem, whereas the latter is poly-time solvable and lies at the core of large-scale data mining.
Denser than the Densest Subgraph: A heuristic approach for large-near clique extraction
In this paper, we define a novel density function, which gives subgraphs of much higher quality than densest subgraphs: the graphs found by our method are compact, dense, and with smaller diameter. We show that the proposed function can be derived from a general framework, which includes other important density functions as subcases and for which we show interesting general theoretical properties. To optimize the proposed function we provide an additive approximation algorithm and a local-search heuristic. Both algorithms are very efficient and scale well to large graphs.
Code
Publications
The
-clique densest subgraph problem: a principled approach for large-near clique extraction

-clique densest subgraph on a collaboration network
My work introduced the -clique densest subgraph problem rigorous theoretical guarantees for scalable extraction of large near-cliques from networks. The original intuition behind designing this family of objectives, which contain DSP as the special case
, is that triangles are a better signature for community participation compared to edges. The figure on the right, shows what we observe on an astrophysics collaboration network, and is representative of what we observe on real-world networks. As
grows, the size of the optimal sets
drops and the edge density
grows. Notice the sudden change from
to
and that for
we are able to find a large near-clique on 62 nodes. The dots are scaled according to
. The same behavior is observed across social networks, autonomous systems, blog and Web graphs. Subsequent work focuses on scaling to large-scale datasets, while maintaining
Code
Publications
- Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling
- The
-clique Densest Subgraph Problem
Densest Subgraph Problem in the Streaming Model
Bahmani et al. first provided a approximation algorithm for the DSP (
) that performs
passes over the input in the streaming model. It was an open question whether that number of passes was the best possible. In joint work with Sayan Bhattacharya, Monika Henzinger, and Danupon Nanongkai we showed that a single pass suffices to get a
approximation. In follow-up paper, my collaborators and I improved the approximation guarantee to
. Specifically, we showed that sampling
edges uniformly at random suffices to get a
approximation guarantee for the DSP.
Publications
- Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling
- Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams
Densest Subgraph Problem for Dynamic Graphs
In our STOC 2015 paper, we provide state-of-the-art results for the DSP on time-evolving graphs. For more details, see here.
Dense Subgraph Discovery, KDD Tutorial
In 2015, Aris Gionis and I gave a tutorial at KDD on dense subgraph discovery. You can watch the Youtube videos from here.