## Latent Feature Learning

Can we automatically infer social circles from an ego-network?

## Learning Latent Features in Social Networks and Overlapping Correlation Clustering

Suppose there exist five agents, each interested or not in each out of three news categories {business, entertainment, sports}. This information is encoded as binary vertex features. Based on their interests, they decide to connect, say with probability $p=1$, if they share at least one common interest. Given this information it is straight-forward to generate (sample) the (a) graph, as shown in the next figure.

The key question we study is the following.

Given an undirected graph $G$, a latent labeling $\omega:V \rightarrow \{0,1\}^k$, where $k$ is the number of binary vertex features and is part of the input, and a probabilistic model of how nodes connect depending on the number of common interests, can we infer the labeling $\omega$?

## Contributions

• Modeling. We depart from the deterministic model we described above by introducing a simple probabilistic generative model with the following property: the probability of an edge between two vertices is a non-decreasing function of their common features.

• Hardness and guarantees. We show that maximizing the log-likelihood function is NP-hard by proving that it subsumes as a special instance the overlapping correlation clustering problem. We provide combinatorial insights by connecting the machine learning problem with specific graph sub-structures. Then, we provide a rapidly mixing Markov chain for learning latent features.

• Scalability. Our method is more than 2,400 faster than a state-of-the-art machine learning method.

• Performance. Our method is able to learn high quality latent labelings. On Twitter, it outperforms or performs comparably well to BigClam, and outperforms various other methods.