Our work is motivated by the following question:
How can we effectively leverage higher-level graph structures, or motifs, for better clustering and community detection in graphs?
In our work that appears in WWW 2017, we provide the theoretical framework of motif-based graph clustering, a spectral algorithm (see also the work of Benson et al that appeared independently from our work in Science), and a well performing heuristic that competes with some popular baselines, but is much faster, better understood, and amenable to a straightforward distributed implementation in Spark, Hadoop and other big-data systems.
Code
You can download some of our software (version 1) from the github repository.
Usage
Our code takes as input a graph, and the ground-truth communities, check files com-amazon.ungraph.txt and com-amazon.top5000.cmty.txt to see the input format.
- Run from a terminal python relabel-graph.py com-amazon.ungraph.txt com-amazon.top5000.cmty.txt amazon.mace amazon.communities.
- Download and compile MACE.
- Run ./mace C -l 3 -u 3 amazon.mace amazon.triangles
- Run python mace-to-list.py amazon.mace amazon.edges
- For some stats, run python community-stats.py amazon.edges amazon.communities amazon.edges.stats and python community-stats.py amazon.triangles amazon.communities amazon.triangles.stats
- Compile the file triangle_clusters.cpp g++ -std=c++11 -o triangle_clusters triangle-clusters.cpp
- Run ./triangle_clusters amazon.triangles numbers_of_nodes (334858 for amazon) threshold_value > amazon_clusters.txt
- Quick heuristic evaluation python grade-clusters.py amazon.communities amazon_clusters.txt output.txt