Our work is motivated by the following question:

How can we effectively leverage higher-level graph structures, or motifs, for better clustering and community detection in graphs?

In our work that appears in WWW 2017, we provide the theoretical framework of motif-based graph clustering, a spectral algorithm (see also the work of Benson et al that appeared independently from our work in Science), and a well performing heuristic that competes with some popular baselines, but is much faster, better understood, and amenable to a straightforward distributed implementation in Spark, Hadoop and other big-data systems.

Code

You can download some of our software (version 1) from the github repository.

Usage

Our code takes as input a graph, and the ground-truth communities, check files com-amazon.ungraph.txt and com-amazon.top5000.cmty.txt to see the input format.

  1. Run from a terminal python relabel-graph.py com-amazon.ungraph.txt com-amazon.top5000.cmty.txt amazon.mace amazon.communities.
  2. Download and compile MACE.
  3. Run ./mace C -l 3 -u 3 amazon.mace amazon.triangles
  4. Run python mace-to-list.py amazon.mace amazon.edges
  5. For some stats, run python community-stats.py amazon.edges amazon.communities amazon.edges.stats and python community-stats.py amazon.triangles amazon.communities amazon.triangles.stats
  6. Compile the file triangle_clusters.cpp g++ -std=c++11 -o triangle_clusters triangle-clusters.cpp
  7. Run ./triangle_clusters amazon.triangles numbers_of_nodes (334858 for amazon) threshold_value > amazon_clusters.txt
  8. Quick heuristic evaluation python grade-clusters.py amazon.communities amazon_clusters.txt output.txt