Privacy and 1d Histograms |

Suppose there is an underlying 1 dimensional histogram stored in the cloud. As a concrete example, consider the distribution of bank deposits (x-axis is the amount of dollars, and the y-axis is the count of accounts). For simplicity let’s assume that all the amounts of money deposited are integers within the range $\{1,\ldots,T\}$ . The histogram is queried by the bank in the following way: the bank sends a range $(t_i,t_j), t_i\leq t_j$ and the cloud returns the count of accounts with money between $t_i,t_j$ . Let’s further assume that the range is selected uar, namely the bank selects the pair of indices $i,j \in [n]$ uniformly at random. how many queries are needed to reconstruct the histogram? This question is interesting because you can imagine that an eavesdropper can see both the query and the answer returned by the cloud and wishes to infer the histogram.

By Chernoff $O(n^2\log{n})$ queries suffice with high probability. However, $O(n\log{n})$ queries suffice to reconstruct the high probability too. To see why consider a graph where we have $T$ nodes, where the $i$ -th node corresponds to the $i$ -th prefix sum. Recall that the $i$ -th prefix sum $\sum_{j \leq i} y_j$ is the number of bank accounts with less or equal to $i$ dollars. If we throw, say $2n \log{n}$ edges uar, by Erdős–Rényi the graph will be connected whp. This is enough to reconstruct the histogram.