Learning Latent Features in Social Networks and Overlapping Correlation Clustering
Suppose there exist five agents, each interested or not in each out of three news categories {business, entertainment, sports}. This information is encoded as binary vertex features. Based on their interests, they decide to connect, say with probability , if they share at least one common interest. Given this information it is straightforward to generate (sample) the (a) graph, as shown in the next figure.
The key question we study is the following.
Given an undirected graph , a latent labeling , where is the number of binary vertex features and is part of the input, and a probabilistic model of how nodes connect depending on the number of common interests, can we infer the labeling ?
Contributions

Modeling. We depart from the deterministic model we described above by introducing a simple probabilistic generative model with the following property: the probability of an edge between two vertices is a nondecreasing function of their common features.

Hardness and guarantees. We show that maximizing the loglikelihood function is NPhard by proving that it subsumes as a special instance the overlapping correlation clustering problem. We provide combinatorial insights by connecting the machine learning problem with specific graph substructures. Then, we provide a rapidly mixing Markov chain for learning latent features.

Scalability. Our method is more than 2,400 faster than a stateoftheart machine learning method.

Performance. Our method is able to learn high quality latent labelings. On Twitter, it outperforms or performs comparably well to BigClam, and outperforms various other methods.