Secure Distributive Storage of Decentralized Source Data: Can Interaction Help?
Salim El Rouayheb, Vinod Prabhakaran and Kannan Ramchandran
Salim presented this work on creating a distributed storage system that also has secrecy. As far as I understood, the model is that there are distributed sources and they want to create a good erasure code in a network: a data collector who accesses any
out of
storage nodes should be able to recover the data. In addition, one storage node is an eavesdropper who gets access to everything communicated to her. The question is to create a distributed code that also has secrecy (i.e. the eavesdropper learns nothing other than what is contained in the nodes she controls) but minimize the communication in the network.
In the model each storage node has a private key and these keys can be used to provide the desired secrecy. The nodes could use interactive protocols of exchanging keys etc, but the main result of the paper is that this does not reduce the required communication compared to a one-way dissemination scheme.
Security in Distributed Storage Systems by Communicating a Logarithmic Number of Bits
by Theodoros K. Dikaliotis, myself and Tracey Ho,
Ted presented our work on security (error correction) for distributed storage systems. We are looking into the problem of finding if a storage node in an system is storing incorrect equations. These could be originating from bit-flipping errors in the storage medium or (more likely) if a malicious node was injecting errors during a previous repair in the storage system.
The model is that we assume a trusted verifier that communicates to the nodes and uses the built in redundancy of the code to find where the errors are located. The first observation is in most distributed storage systems the same code is used multiple times to reduce the overhead from storing the coefficients of the code. Instead of sending all the packets, all the storage nodes can project onto the same vector and only send the projections. The error detection works since the projected data follows the same
code which can tolerate up to half the minimum distance adversarial errors (under optimal decoding). The problem is that the vector
is very long, around the same order as the stored data. One way around this is to create a pseudorandom
using a small seed
, and project on that. All the storage nodes need to share the small seed
and if we use pseudorandom generators that fool linear functions we can bound the probability of failure of a computationally unbounded adversary who places the errors. This relies on the pseudorandom generators of Naor & Naor (but for large finite fields).
This trick applies more generally: Assume you can prove some bad event that involves a linear function of an length i.i.d. random vector
happens with very small probability
.
Then instead of using the random vector, you can use a pseudo-random vector that is generated from only
random bits and still have a bound on the bad event being less than
where
is the total variation bound obtained from the pseudo-random generator.
On Secure Distributed Data Storage Under Repair Dynamics
by Sameer Pawar, Salim El Rouayheb, Kannan Ramchandran
Sameer presented his work the repair problem under secrecy constraints. The problem is to construct a distributed storage system that stays secure when repairs are taking place. The model is that an eavesdropper can observe the data arriving at nodes in an
distributed storage system. The static version of this problem (no repairs) is equivalent to the erasure-erasure wiretap-II channel studied in this paper by Arunkumar and Mclaughlin while this paper generalizes this problem under the natural repair dynamics. As far as I understood, there were two results. One was a general bound on the secrecy capacity for any distributed storage system. There is also a surprising separation result:
For distributed storage operating at the minimum bandwidth point (MBR) secrecy and repair can be separated without loss of optimality. The authors show that the combination of using an MDS code to achieve secrecy (by ‘mixing’ the data with random noise keys) and exact Minimum bandwidth regenerating codes that were recently invented in this paper by Rashmi et al. for repair, achieves the secrecy capacity.
Interestingly, the same separation is no longer optimal for other points of the storage-repair tradeoff and as far as I understood the secrecy capacity remains open.