Social Media

Logistic regression coefficients.  Edge disjoint paths of short length are informative features. For details, see our paper.

Predicting signed interactions

Ιnteractions in social media involve both positive and negative relationships.  For example Slashdot, a popular web site with news for nerds, introduced in 2002 the Slashdot Zoo, where online users can tag others as friends or foes. This motivates the following interesting and important question, that was first asked in this paper:

Can we predict  whether an interaction will be positive or negative?

Another closely related question is the following. Instead of being given the social network, we can  query pairs of nodes, and learn whether their interaction is positive or negative. However, the answer provided to us is correct with probability $\frac{1}{2}+$small bias. Let the small bias be $\frac{\delta}{2}, \delta>0$ This motivates the following question that initiates the study of the edge sign prediction problem from an active learning perspective under a fully random noise model.

Under the assumption that the friend of my friend is my friend, and the enemy of my friend my enemy, can we recover all signs correctly with high probability? If yes, what is the smallest number of queries required?

Contributions

We have proved using random graph theory and Fourier analysis that (roughly) $O(n\log{n})$ queries uniformly at random suffice to predict   all possible ${n \choose 2}$ signs with high probability, as long as the bias is constant. Our algorithm uses BFS as its main subroutine, and exploits paths in a careful manner to make its successful predictions. Based on this theoretical insight, we explore the use of short-length, edge disjoint paths for the edge sign prediction problem on social networks, and show experimentally that they are informative features. Interestingly, paths may be even more informative than triads that are known to be very informative.

Code

Python code and a demo are available on Github. Our method is able to achieve high levels of accuracy, improving prior work.

Accuracy using 10-fold cross validation and different subset of features, broken down by embeddedness lower bounds (i.e., embeddedness of a pair of nodes is the number of common neighbors). For details, see our paper.