Privacy and 1d Histograms

Suppose there is an underlying 1 dimensional histogram stored in the cloud. As a concrete example, consider the distribution of bank deposits (x-axis is the amount of dollars, and the y-axis is the count of accounts). For simplicity let’s assume that all the amounts of money deposited are integers within the range \{1,\ldots,T\}. The histogram is queried by the bank in the following way: the bank sends a range (t_i,t_j), t_i\leq t_j and the cloud returns the count of accounts with money between t_i,t_j. Let’s further assume that the range is selected uar, namely the bank selects the pair of indices i,j \in [n] uniformly at random. how many queries are needed  to reconstruct the histogram? This question is interesting because you can imagine that an eavesdropper can see both the query and the answer returned by the cloud and wishes to infer the histogram.

By Chernoff O(n^2\log{n}) queries suffice with high probability. However,  O(n\log{n}) queries suffice to reconstruct the high probability too. To see why consider a graph where we have T nodes, where the i-th node corresponds to the i-th prefix sum. Recall that the i-th prefix sum \sum_{j \leq i} y_j is the number of bank accounts with less or equal to i dollars.  If we throw, say 2n \log{n} edges uar, by Erdős–Rényi the graph will be connected whp. This is enough to reconstruct the histogram.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: