Codes for Distributed Storage: The repair problem

The first two talks in this session

Explicit and Optimal Exact-Regenerating Codes for the Minimum-Bandwidth Point in Distributed Storage by K. V. Rashmi , Nihar B. Shah , P. Vijay Kumar , Kannan Ramchandran


A Flexible Class of Regenerating Codes for Distributed Storage by Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar

where given by Vijay Kumar.  To summarize the results in these two papers (and also

Exact-Repair MDS Codes for Distributed Storage Using Interference Alignment

by Changho Suh and Kannan Ramchandran

that appeared in another session), it is useful to briefly explain the current landscape of the repair problem:

(for more information see the Distributed storage wiki and this survey paper which however do not contain these new results yet.)

Assume a file to be stored, of size B (sometimes also found as \mathcal{M} in the literature) one can use an (n,k) MDS code for storage with maximum reliability: just cut the file into k packets of size B/k, encode into n pieces (of the same size \alpha=B/k ) and store each packet into a storage node. The reliability achieved is that any k storage nodes contain enough information to recover the file, which is the best possible. So we can tolerate any n-k failures without losing data. But most of the time we only have one failure. We then want to repair this failure by creating a new encoded packet by communicating as little information as possible from d  surviving storage nodes. There are two cases:

1. If we ask that the new packet still forms a good code (i.e. any k out of the new n nodes suffice to recover), the problem is called functional repair and is completely solved and equivalent to multicasting.  The minimum repair bandwidth can be determined by a cut-set bound and we will refer to the achievable region as the cut-set region.

2. If we ask that the new packet formed is exactly identical to the lost one: This problem is called exact repair and is very difficult. It is equivalent to a multicasting network coding problem with intermediate nodes having requests for subsets and therefore prior tools from network coding do not suffice. The repair problem seems close to locally decodable codes but this connection has not been investigated yet.

The amount of information communicated from d surviving nodes is called the repair bandwidth \gamma=d\beta. Another issue is that one can reduce this repair bandwidth if the storage per node \alpha is allowed to increase and there is a tradeoff involved.

The two most important points of this tradeoff are called Minimum Storage point (MSR)  (where \alpha=B/k corresponds to MDS codes) and Minimum Bandwidth point (MBR) (where \alpha is larger but the repair bandwidth is minimized).

Clearly exact repair is a strictly harder than functional repair and the cut-set region is still a valid outer bound. A substantial amount of recent work has been focused on understanding the communication required for exact repair. All the results here establish that the cut-set region is achievable for exact repair for the parameters discussed.

Vijay presented a storyline that concluded to the settings which were known before.

This paper had completely solved the problem for the Minimum Bandwidth (MBR) point, for any (n,k), but fixing d=n-1.

The first paper (Explicit and Optimal Exact-Regenerating Codes for the Minimum-Bandwidth Point in Distributed Storage) in this session established that the MBR cut-set bound point can be achieved with exact repair for any d, corresponding to an arbitrary number d\leq n-k of failed nodes. The second result in this paper is that all the other interior points of the tradeoff curve (except the points of minimum repair and minimum storage) cannot be achieved with linear codes for exact repair. ( Of course this does not preclude the possibility that they can be approached infinitely close by making the code packets smaller and smaller).

In the second paper A Flexible Class of Regenerating Codes for Distributed Storage, the authors generalize the requirements of the distributed storage system: the initial requirement was that any data collector who connects to any k storage nodes (and receives all their stored data ) obtains enough information to recover the file. Here the authors want to enable data collectors who have limited communication with each storage node to reconstruct the file. The only requirement is that the total amount of information communicated is sufficient. This is, of course, a harder requirement on the storage code and the authors characterize the repair bandwidth for functional repair under this new requirement of flexible recovery. The technical part involves graph theoretically bounding the flow for these flexible information flow graphs. The question of exact repair of these flexible regenerating codes naturally arises and it seems to me that previous exact regenerating code constructions do not immediately generalize.

Finally, in another session was the closely related paper

Exact-Repair MDS Codes for Distributed Storage Using Interference Alignment

by Changho Suh and Kannan Ramchandran

Changho presented two new results: both are for the minimum storage point (MSR) and hence correspond to results for MDS codes.

One is that for the low-rate regime k/n \leq 1/2 if the number of nodes helping for the repair is sufficiently large d \geq 2k-1,  the MSR cut-set point can be achived for exact repair. For this regime the paper presents explicit code constructions that use a small finite fields and build on this paper which established similar results but only when repairing systematic nodes (and not nodes storing parities).

The second result of this paper closes the open problem of MSR exact repair for the high-rate k/n\geq 1/2 regime. The paper uses the symbol extension technique introduced in the Cadambe-Jafar paper Interference Alignment and the Degrees of Freedom for the K User Interference Channel to MDS codes that asymptotically approach the cut-set bound.

This paper appeared simultaneously with

Distributed Data Storage with Minimum Storage Regenerating Codes – Exact and Functional Repair are Asymptotically Equally Efficient, by Viveck R. Cadambe, Syed A. Jafar, Hamed Maleki,

Which independently obtain very similar results using the same technique. Note that the symbol extension approach requires vector linear network coding (sub-packetization of each stored packet) and to approach the cut-set bound an enormous amount of subpacketization (and field size) seem to be required. I think it is correct to interpret these results as existence proofs of high-rate regenerating codes approaching the cut-set bound for exact repair, rather than explicit constructions.

I think that the applicability of interference alignment to problems that seem to have nothing to do with wireless channels is quite remarkable. Another related case is the paper

Network Coding for Multiple Unicasts: An Interference Alignment Approach

by A. Das, S. Vishwanath, S. Jafar, A. Markopoulou.

I will blog my notes on this paper (hopefully) soon. Are the slides available anywhere? I took pictures of slides from other talks but my camera run out of battery.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.