Mining Uncertain Graphs |

Risk-Averse Matchings over Uncertain Graph Databases

Probabilistic graph, each edge $e$ is annotated with $(p(e),w(e))$ , its probability and its reward/weight. The matching $(A,B), (C,D)$ has higher expected weight than $(A,C), (B,D)$ . However, the reward of the former matching is 0 with probability $\frac{1}{4}$ , but the reward of the latter matching is 80 with probability 1.

A large number of applications such as querying sensor networks, and analyzing protein-protein interaction (PPI) networks, rely on mining uncertain graph and hypergraph databases. In this paper we study the following problem:

Given an uncertain, weighted (hyper)graph, how can we efficiently find a (hyper)matching with high expected reward, and low risk?

For instance, the figure on the left shows an uncertain graph with two perfect matchings. While the matching $(A,B), (C,D)$ has expected reward $100 \times 2 \times \frac{1}{2}$ , and the matching $(A,C), (B,D)$ expected reward 80, the former entails greater risk: with probability $(1-\frac{1}{2})\times (1-\frac{1}{2})$ the actual reward will be 0.

Contributions

Model. We introduce a general model of uncertain weighted hypergraphs, that subsumes prior work and is suitable for a wider variety of applications, including privacy related applications that inject continuous noise to the edge weights. We provide a novel formulation for risk-averse (hyper)graph matchings.
Approximation algorithms. For the case of uncertain weighted graphs, we provide a $\frac{1}{3}$ -approximation algorithm, and a $\frac{1}{5}$ -approximation algorithm with near optimal run time. For the case of uncertain weighted hypergraphs, we provide a $\Omega(\frac{1}{k})$ -approximation algorithm, where $k$ is the rank of the hypergraph (i.e., any hyperedge includes at most $k$ nodes), that runs in almost (modulo log factors) linear time.
Experiments. We complement our theoretical results by testing our approximation algorithms on a wide variety of synthetic experiments, where we observe in a controlled setting interesting findings on the trade-off between reward, and risk. We also provide an application of our formulation for providing recommendations of teams that are likely to collaborate, and have high impact.

Code

Data

We created an uncertain weighted hypergraph for the purpose of testing our methods from DBLP. The data is available here as pickled file. For a complete description, check our paper.