In this work we study the following important question:
How much information about a network can be learned given access to a subset of potentially noisy estimates of pairwise node similarities?
This question is especially interesting from a privacy perspective. A common privacy breach is social link disclosure, in which an attacker attempts to learn potentially sensitive links between nodes in a network, or node similarities between other pairs of nodes. The question is also well motivated from a data mining, as well as a compression/computational perspective. For instance, computing all pairwise node similarities can be infeasible for large networks since the number of similarities grows quadratically in the number of nodes. What is the minimum number of pairwise node similarities that allows us to recover the rest?
In this work we focus on the popular choice of random walk node similarity measures. Specifically,
Reconstruct an unknown graph , given a set of noisy random-walk node similarity measures measurements, for each , where is a set of node pairs and is a potentially random noise term. How accurately can an attacker recover the unknown node similarities ?
- Mathematical formulation. We provide an optimization-based formulation of the problem of learning a graph from effective resistances. Specifically, given a set of effective resistance measurements, we consider the problem of finding a graph whose effective resistances match the given resistances as closely as possible. Our formulations carry out on other random walk based similarity measures.
- Algorithms. We provide efficient algorithms for our optimization problems.
- Experimental Results. We show interesting experimental findings on both controlled experimental settings, as well as on Facebook ego-networks. Our results strongly indicate that releasing such information about a social network can be harmful from a privacy perspective, but also shows that pairwise node similarities can be significantly compressed (lossy compression).