Latent Feature Learning


Can we automatically infer social circles from an ego-network?

Learning Latent Features in Social Networks and Overlapping Correlation Clustering

Suppose there exist five agents, each interested or not in each out of three news categories {business, entertainment, sports}. This information is encoded as binary vertex features. Based on their interests, they decide to connect, say with probability p=1, if they share at least one common interest. Given this information it is straight-forward to generate (sample) the (a) graph, as shown in the next figure.


The key question we study is the following.

Given an undirected graph G, a latent labeling \omega:V \rightarrow \{0,1\}^k, where k is the number of binary vertex features and is part of the input, and a probabilistic model of how nodes connect depending on the number of common interests, can we infer the labeling \omega?


  • Modeling. We depart from the deterministic model we described above by introducing a simple probabilistic generative model with the following property: the probability of an edge between two vertices is a non-decreasing function of their common features.

  • Hardness and guarantees. We show that maximizing the log-likelihood function is NP-hard by proving that it subsumes as a special instance the overlapping correlation clustering problem. We provide combinatorial insights by connecting the machine learning problem with specific graph sub-structures. Then, we provide a rapidly mixing Markov chain for learning latent features.

  • Scalability. Our method is more than 2,400 faster than a state-of-the-art machine learning method.

  • Performance. Our method is able to learn high quality latent labelings. On Twitter, it outperforms or performs comparably well to BigClam, and outperforms various other methods.



%d bloggers like this: