Johnson-Lindenstrauss, Restricted Isometry (RIP), for-each vs for-all, and all that

Posted on November 18, 2012 by Alex Dimakis

The Johnson-Lindenstrauss lemma shows how to project points in much lower dimension without significant distortion of their pairwise Euclidean distances.

Say you have n arbitrary vectors $p_1, p_2 \ldots, p_n$ in $\mathbb{R}^n$ .
Use a fat random (normalized) Gaussian matrix of dimensions $m \times n$ where

$m= \frac{c}{\epsilon^2} \log n$ ,

to project the points in m dimensions.

Then, with probability at least 1/2, the projected points $A p_1, A p_2, \ldots A p_n$ have pairwise distances that satisfy:

$(1-\epsilon) ||p_i -p_j ||_2 \leq ||A p_i - A p_j ||_2 \leq (1+\epsilon) ||p_i -p_j||_2$ .

This is quite remarkable since we reduced the dimension from $n$ into $\log n$ .

This is a very fat matrix.

A number of other matrices beyond Gaussian have been shown to also work with similar guarantees.

The Johnson-Lindenstrauss lemma sounds similar to the Restricted Isometry Property (RIP).
A fat $m \times n$ matrix A has the RIP property if

$(1-\epsilon) ||x||_2 \leq ||A x ||_2 \leq ||(1+\epsilon) ||x||_2$ ,

for all vectors $x$ that are k-sparse, i.e. $||x||_0 \leq k$ .

This means that the matrix A does not distort the $\ell_2$ norm of sparse vectors.
As before, well-known results have shown that random Gaussian matrices satisfy RIP with high probability if we set
$m = \Theta( k \log(n/k))$ .

To understand the similarities and differences between RIP and JL, we need to see how these results are proven. The central question is how fat random matrices distort the length of vectors and with what probability.

All known proofs of JL (for each matrix family) start with a lemma that deals with one fixed vector:
Lemma 1:

For each given vector $x$ , the matrix A does not distort the length too much, with good probability:

$Pr[ (1-\epsilon) ||x||_2 \leq ||Ax||_2 \leq (1+\epsilon) ||x||_2] \geq 1- P_{fail}(m,n) \quad (1)$

The key issue is between for all versus for each vector quantifiers.
If we first fix the vector and throw the matrix randomly, this is much easier than promising something for a random matrix that once realized, works w.h.p. for all vectors.

Clearly, if first realize the matrix, there are certainly vectors where lemma 1 cannot hold: since the matrix is fat, it has a nullspace so there are nonzero vectors that have $||Ax||=0$ . Lemma 1 simply says that this is unlikely and gives a `for each vector, a random matrix is good whp’ guarantee.

How small we can make $P_{fail}$ depends crucially on how small we make m. Since we want to achieve dimensionality reduction we want to make the matrix as fat as possible, i.e. m as small as possible. But $P_{fail}$ will become larger.

To prove JL for some family of matrices we need to make $P_{fail}\leq 1/n^2$ .

If we have lemma 1 with this guarantee, we just do a union bound over all ${n \choose 2}= \Theta(n^2)$ pairs of vectors and obtain the pairwise distance distortion guarantee for all vectors $p_1, p_2, \ldots p_n$ . Hence JL is Lemma 1 + union bound over all pairs.

It turns out that for Gaussian iid matrices, if you make $m=\frac{1}{\epsilon^2} \log n$ , the probability of error is smaller than $P_{fail} \leq 1/n^2$ . This follows by the wonderful symmetries of the Gaussian: $||Ax ||_2$ has the same distribution as $||A e_1 ||_2$ i.e. the $\ell_2$ norm of the first column of A. Therefore $||Ax ||^2_2$ has a Chi-square distribution with m degrees of freedom and standard concentration results suffice to bound $P_{fail}$ .

To prove RIP, exactly the same recipe can be followed: first prove Lemma 1 for one vector and then do a union bound over all ${n \choose k}$ k-sparse vectors. The problem is now we have a much larger union bound and therefore we need $P_{fail} \leq \frac{1}{{n \choose k}}$ . This enforces a much larger m compared to JL.

The general recipe emerges. Say you want to prove a good property for all hairy vectors.

First show that for each fixed vector a random matrix has the good property with probability at least $1- P_{fail}$ .

Then ensure $P_{fail} \leq \frac{1}{\text{No of hairy vectors}}$ . This will limit how much dimensionality reduction you can achieve.

Finally, a union bound establishes the property for all hairy vectors.

Irving S. Reed

Posted on September 11, 2012 by Alex Dimakis

USC announced today: “Professor Irving S. Reed passed away this morning, September 11, 2012, at the age of 88. Professor Reed was a member of the Ming Hsieh Department of Electrical Engineering faculty from 1963 until his retirement in 1993 as Powell Chair of Engineering Emeritus.”

Prof. Reed had made significant contributions in radar design theory, programming languages and, of course, coding theory. Reed-Muller and Reed-Solomon codes have been incredibly influential in electrical engineering, theoretical computer science and mathematics.

Reed-Solomon codes are used to protect data in countless applications including CDs, DVDs, DSL and RAID. They were also used to encode the digital images sent back by space probes like the Voyager. RS codes are my favorite example of something practically useful and at the same time theoretically deep.

The announcement mentions: “Millions of people today enjoy the benefits of Reed’s many inventions and contributions of technology, without being aware of their remarkable benefactor.”

Meanwhile in Greece: Tax inspectors besieged in Hydra.

Posted on August 18, 2012 by Alex Dimakis

This Friday, tax inspectors arrived in the island of Hydra located in the Aegean sea near Athens. Hydra (also spelled Ydra) is a high-end tourist destination with a strong yachting and maritime tradition.

As in.gr reported, one restaurant owner was caught for tax evasion and fainted. Tax inspectors arrested her son and led him to the local police station.

Subsequently, protesting locals besieged the police station where he was kept. The tax inspectors were forced to remain in the station until the morning, when special police units arrived from Athens to set them free.

A Ranking of World University Rankings

Posted on August 16, 2012 by Alex Dimakis

Shanghai Jiao Tong University recently published its Academic Ranking of World Universities (ARWU) for 2012. As usual, it contains both general university rankings and many specialized field and location based lists.
Ranking universities is an impressive dimensionality reduction challenge:
map from academic institutions to one dimension, hoping that your projection relates to loosely defined concepts like academic reputation. Think of trying to land a fighter jet on an aircraft carrier while reading Tractatus Logico-Philosophicus to understand the concept of landing.

Rankings of specific departments are much less meaningless compared to university rankings and serve different purposes. As a student, I must admit that rankings seemed very useful. They allegedly help in another aircraft carrier landing task of choosing a school.

Just like everybody else, I still look rankings but now I have several concerns. Graduate school rankings in particular create a self-fulfilling prophecy: the best students and young faculty join higher ranked schools when given the option, hence creating a positive feedback loop.

US News almost has a de-facto national ranking monopoly, especially in the eyes of students. This sounds like a dangerous thing. Universities and departments have strong incentives to optimize a specific formula involving indicators like PhDs awarded per year and papers per academic staff. Aligning with a specific ranking methodology might not necessarily align with society’s or student interests.

Instead of thinking about these important questions, I compiled my personal ranking of world university rankings. The methodology is highly scientific: I read through each list and counted how many times I screamed: ‘Seriously, you ranked this school higher than this other school?‘.

The ranking with the lowest number of screams wins. Interestingly, ARWU wins my least absurd ranking award. The rest are:

1. ARWU

2. Times Higher Education World University Rankings

3. QS World University Rankings (aka US News world)
(Imperial at #6 and Stanford at #11 /facepalm)

(RatER, Forbes remain unranked)

Advice on giving talks

Posted on July 3, 2012 by Alex Dimakis

We are at ISIT and I realize I am going over the same points multiple times with my students, so I thought of summarizing everything here.

How to give a better ISIT Talk.

1. Take your talks very seriously.

Do practice runs. Many of them. Your only hope for academia is by giving great talks. Give a practice talk to your friends. In the middle of your talk pause and quiz them: ok, did you get why alpha and beta are not independent? (hint: they did not).
If they did not, it is your problem not their problem.

2. They do not remember what alpha is.

In most talks, your audience does not understand what the notation is, what the problem is, or why they should care. Think of yourself: how often do you sleep or suffer through talks without even knowing what the problem is?
Do not treat your audience like that.

It is a typical scene when the presenter is focusing on a minor technical issue for ten minutes when 90% of the audience does not even know what exactly the problem is, or care.

One important exception is when your audience works on the same problem. Typically only a small part of your talk should be focused on these experts (see also 13).

3. Do a multi-resolution talk.

A useful guideline is: for an 18 minute talk, 7-9 minutes should go on explaining the formulation of your problem and why should anybody care. 5-6 minutes on explaining *what* the solution is and 4 minutes or so, on the actual painful technical stuff. The first part should be aimed at a first year grad student level. The second at a senior grad student in the general ISIT area and the last part to the expert working on related problems. If fewer than 90% of your audience are checking email in the last part of your talk, consider that a success.

4. Try to make things simple, not difficult.

It is a common mistake for starting grad students to think that their work is too simple. For that reason they will not mention known things (like explaining that ML decoding for the erasure channel consists of solving linear equations, because they fear this is too simple and well known).
Always mention the basic foundations while you try to explain something non-trivial. Your goal is not to sound smart but rather to have your audience walk out knowing something more.

Even when your audience hears things they already know, they get a warm fuzzy feeling, they do not think you are dumb.

5. Add redundancy, repeat a lot in words.

Do not say ‘We try to minimize d(k)’.
Say `we try to minimize the degree d which as I mentioned, is a function of the number of symbols k’. Repeat things all the time: Summarize what you will talk about, and in conclusions say the main points again.

6. Go back to basic concepts in words, repeat definitions.

Try to mention the basic mathematical components not the jargon you have introduced. Do not not say ‘Therefore, the code is MSR-optimal‘ but ‘Therefore, the code minimizes the repair communication (what we call MSR optimal)‘. Try to reduce your statements back to fundamental things like probabilities, graphs, rank of matrices, etc whenever possible. Do not just define some alpha jargon in the first slide and talk about that damn alpha throughout your talk.

7. Never go over time.

I have often seen even experienced speakers getting a warning that they have 3 minutes and still trying to go through their ten last slides. When you are running out of time, the goal is not to talk faster.

Say something like ‘Unfortunately or fortunately for you, I do not have time to go into the proof so I will have to skip it. The main ingredient involves analyzing random matchings which is done through Hall’s theorem and union bounds. Please talk to me offline if you are interested…”
Then, go through your conclusions slowly, repeating your main points.
This is another example of multi-resolution: you explain the techniques at a high level first. Even if you had time, you would still first have to give a one sentence high level description and then get into the the details.

8. Draw attention to important slides.

People are probably checking the Euro final when you are at slide 4, explaining what your problem is all about. Wake them up and give a notification that this is the one slide they do not want to miss. Do this right before the critical points.

9. Every slide should have one simple message.

After you make your slides ask yourself: what is the goal of this slide, I just want to explain this part. Iteratively try to simplify your slides into smaller and smaller messages. It is easier for your audience to grasp one packet of information at a time. Do not have derivations on slides (especially for an 18 minute talk), unless there is one very simple trick you really want to show. Showing math does not make you look smarter.

10. Be minimalist.

Every word on your slides, every symbol or equation you put up there dilutes the attention of your audience. Look at each bullet/slide and ask, do I really need this part or can I remove it?

11. Be excited.

Vary the tone of your voice, it may wake up somebody. You need to entertain and perform. Think: if you are not excited with your results why should anybody else be?

12. Cite people.

When somebody has related prior work, cite them on your slide. That has the benefit of waking them up when they see or hear their name.

As Rota says: `Everyone in the audience has come to listen to your lecture with the secret hope of hearing their work mentioned.‘

13. Connect to what your audience cares about.

This is non-trivial and requires experience. If you are giving a talk in a fountain codes session, you do not have to spend ten minutes defining things your audience knows already. Still define it quickly to make sure everybody is on the same page on notation. Knowing how to be at the right resolution for your audience becomes easier in time.

14. Prepare your logistics.

Know the room (go there before), know who your session chair is, have your macbook projector dongle, pre-load your slides on a USB. Bring your charger, disconnect from the internet (fun Skype messages pop-up during talks). If you are using a different machine, test your Powerpoint slides (hint: they look completely different).

15. Talk to people afterwards.

Talk to people about their work and your work. Remember that this is a professional networking event. Do not hang out with your friends, you have plenty of time for that after you go back home. Networking with other students and faculty is very important, in my case I learn more by talking to people offline than in talks.

16. Engineering theory is essentially story-telling.

Our papers and talks are essentially story-telling: Here is a model for a wireless channel, here is a proof about this model. A good story has an intellectual message that will hopefully help people think about a real engineering problem in a cleaner way.
The other aspect of our job is creating algorithms that are hopefully useful in real systems. Think: what is your story and how will you present it in your talk.

17. Read the brilliant Ten Lessons I Wish I Had Been Taught by Gian-Carlo Rota.

A proof that P is not equal to NP?

Posted on August 8, 2010 by Alex Dimakis

Today we had the last day of the information theory summer school and Rudiger Urbanke gave a lecture on LDPC codes, random graphs, polar codes and other ways to approach Shannon Capacity.

After it was over, I received a few emails about a potential proof that P is not equal to NP from Vinay Deolalikar.

A quick look at the proof seems (to my ignorant eyes) like a serious, well-written attempt, and some people who know more seem to agree.

Surprisingly, Replica symmetry breaking, random graphs (and how they are locally tree-like) and inference on graphical models (things we discussed in the school) seem to be central in the arguments.

Let’s hope the proof is correct!

Explicit constructions of Expanders and LDPC codes

Posted on June 23, 2010 by Alex Dimakis

After seeing Nisim’s ISIT talk on LP Decoding of Regular LDPC Codes in Memoryless Channels, I remembered one issue that has been puzzling me for a while: if there are any explicit constructions of LDPC codes that have a provable nonzero threshold (for block-error probability, hence excluding density evolution arguments) i.e. correct a constant fraction of bit-flipping errors (other channels would be harder)– under any efficient decoder.

The question seems trivial: As is well-known, a random $(d_v,d_c)$ regular graph will be a very high quality expander (for very small sets) and therefore a (very small) constant fraction of bit-flipping errors can be corrected by Sipser & Spielman‘s bit flipping algorithm.

Also the work by Feldman et al. LP decoding corrects a constant fraction of errors establishes correction of an adversarial set of bit-flipping errors for expander graphs under LP decoding.

But here is the catch: sure, a random graph is a good expander with high probability, but how do we check if a given graph is a good expander?

The key of course is my definition of explicit construction: I mean an algorithm that terminates in polynomial time and gives me a code that certifiably works (and I think this is relevant to practice, since at the end of the day I have to use one fixed code). The only efficient way that I know to check expansion involves spectral methods ( This is from Modern Coding theory by Richardson & Urbanke) : If we have a $(d_v,d_c)$ regular bipartite graph with $n$ variable nodes and $\frac{d_v}{ d_c} n$ check nodes, if we take $H$ to be the incidence matrix of the bipartite graph (the parity check matrix of the code), we can form $H^T H$ and look at its eigenvalues

$\lambda_1 \geq \lambda_2,\geq \lambda_3 \cdots$ and Tanner’s spectral bound on expansion is that expansion $\gamma$ is greater than

$\gamma(\alpha) \geq \frac{ d_v}{(d_v d_c - \lambda_2) \alpha + \lambda_2}$ for any size of expanding sets $\alpha$. Unfortunately this bound cannot certify expansion for any $\gamma >1/2$ , which is exactly the point where it starts being useful for coding theory. Perhaps there are stronger spectral bounds that could establish more expansion, the book Spectral Graph Theory by Fan Chung Graham contains a lot of material on that point but I have not seen any such applications to coding.

So ok, lets say we do not know how to construct (or certify) that LDPCs have high expansion, how about other graph properties that will guarantee a correctable fraction of errors in polynomial time? This started when I was working with Costas on LP decoding for LDPC codes and we were always (incorrectly) assuming that random regular bipartite graphs will have $\log n$ girth with high probability. When we actually tried to find a proof for this, for example looking at the Modern Coding theory book we find that usual proofs establish a significantly weaker statement: in a random $(d_v,d_c)$ regular graph, if you start from a variable and start expanding the tree, you will not loop around after a constant number of steps with $1/poly(n)$ probability. This is what is refered to as ‘locally-tree like’. I do not know of any stronger statements but I think it can be easily shown that for any fixed cycle length, the expected number of cycles of that length is constant for regular random graphs.

The breakthrough paper by Arora, Daskalakis and Steurer, ‘Message-Passing Algorithms and Improved LP Decoding‘ establishes that regular LDPCs with $\log n$ girth will correct a constant fraction of random bit-flipping errors whp under LP decoding. But how do we find regular LDPCs with $\log n$ girth ?

After searching a little I found the recent work of Bayati et al.

Generating Random Graphs with Large Girth and Generating random Tanner-graphs with large girth that talk about the related problem of generating a graph with high girth uniformly from all such graphs (and with a given degree distribution) but as far as I understood these constructions cannot guarantee a diameter scaling like $\log n$ (but only any constant diameter). This is of course the relevant practical question but the scaling issue remains.

The only construction that I know is the one found in the appendix of Gallager’s thesis that contains a deterministic algorithm that constructs $(d_v,d_c)$ regular Tanner graphs with $\log n$ girth.

The same question is relevant for compressed sensing when we ask for sparsity in the measurement matrix: All the constructions that I know for sparse measurement matrices (that require the optimal number of measurements under Basis pursuit, e.g. see the survey of Gilbert and Indyk: Sparse Recovery using Sparse Matrices) are constructed from high $\gamma>2/3$ bipartite expander graphs. But given a fixed measurement matrix how do we certify that it is good and should be implemented in a system?

Of course one can use Message passing algorithms for compressed sensing (by Donoho, Maleki and Montanari) and obtain sparse measurement matrices with very good thresholds but under a different decoding algorithm.

The close connection of compressed sensing (under LP – Basis pursuit decoding) and channel coding (under LP decoding) gives one path for certifiable measurement matrices that are sparse):

1. The Arora, Daskalakis & Steurer paper guarantees that (3,6) regular graphs with $\log n$ girth correct a (quite high– much higher than expansion arguments) constant fraction of errors under LP decoding.

2. Gallager’s thesis appendix contains deterministic constructions of such (3,6) regular sparse matrices.

3. Our connection result establishes that if a matrix corrects a constant fraction of bit flipping errors, the same matrix used as a compressed sensing measurement matrix will recover all sparse signals (of the same support as the bit-flipping errors).

Combining (1)+(2)+(3) we obtain that: for sensing a length $n$ signal with $m=n/2$ measurements one recovers almost all sparse signals of size up to $0.05 n$ (whp over supports) and the $(0,1)$ measurement matrix has $O(n)$ non-zeros. (see our recent paper )

Conclusion: The Appendix C in Gallager’s thesis contains the best sparse compressed sensing measurement matrices known (under basis pursuit).

Not bad for 1963!

Secrecy and Security in Distributed Storage

Posted on June 21, 2010 by Alex Dimakis

Secure Distributive Storage of Decentralized Source Data: Can Interaction Help?

Salim El Rouayheb, Vinod Prabhakaran and Kannan Ramchandran

Salim presented this work on creating a distributed storage system that also has secrecy. As far as I understood, the model is that there are $k$ distributed sources and they want to create a good erasure code in a network: a data collector who accesses any $k$ out of $n$ storage nodes should be able to recover the data. In addition, one storage node is an eavesdropper who gets access to everything communicated to her. The question is to create a distributed code that also has secrecy (i.e. the eavesdropper learns nothing other than what is contained in the nodes she controls) but minimize the communication in the network.

In the model each storage node has a private key and these keys can be used to provide the desired secrecy. The nodes could use interactive protocols of exchanging keys etc, but the main result of the paper is that this does not reduce the required communication compared to a one-way dissemination scheme.

Security in Distributed Storage Systems by Communicating a Logarithmic Number of Bits

by Theodoros K. Dikaliotis, myself and Tracey Ho,

Ted presented our work on security (error correction) for distributed storage systems. We are looking into the problem of finding if a storage node in an $(n,k)$ system is storing incorrect equations. These could be originating from bit-flipping errors in the storage medium or (more likely) if a malicious node was injecting errors during a previous repair in the storage system.

The model is that we assume a trusted verifier that communicates to the nodes and uses the built in redundancy of the code to find where the errors are located. The first observation is in most distributed storage systems the same code is used multiple times to reduce the overhead from storing the coefficients of the code. Instead of sending all the packets, all the storage nodes can project onto the same vector $r$ and only send the projections. The error detection works since the projected data follows the same $(n,k)$ code which can tolerate up to half the minimum distance adversarial errors (under optimal decoding). The problem is that the vector $r$ is very long, around the same order as the stored data. One way around this is to create a pseudorandom $r'$ using a small seed $s$ , and project on that. All the storage nodes need to share the small seed $s$ and if we use pseudorandom generators that fool linear functions we can bound the probability of failure of a computationally unbounded adversary who places the errors. This relies on the pseudorandom generators of Naor & Naor (but for large finite fields).

This trick applies more generally: Assume you can prove some bad event that involves a linear function of an $n$ length i.i.d. random vector $r$ happens with very small probability $p_{bad}$ .

Then instead of using the random vector, you can use a pseudo-random vector $r'$ that is generated from only $\log n$ random bits and still have a bound on the bad event being less than $p_{bad}+ \epsilon$ where $\epsilon$ is the total variation bound obtained from the pseudo-random generator.

On Secure Distributed Data Storage Under Repair Dynamics

by Sameer Pawar, Salim El Rouayheb, Kannan Ramchandran

Sameer presented his work the repair problem under secrecy constraints. The problem is to construct a distributed storage system that stays secure when repairs are taking place. The model is that an eavesdropper can observe the data arriving at $\ell$ nodes in an $(n,k)$ distributed storage system. The static version of this problem (no repairs) is equivalent to the erasure-erasure wiretap-II channel studied in this paper by Arunkumar and Mclaughlin while this paper generalizes this problem under the natural repair dynamics. As far as I understood, there were two results. One was a general bound on the secrecy capacity for any distributed storage system. There is also a surprising separation result:

For distributed storage operating at the minimum bandwidth point (MBR) secrecy and repair can be separated without loss of optimality. The authors show that the combination of using an MDS code to achieve secrecy (by ‘mixing’ the data with random noise keys) and exact Minimum bandwidth regenerating codes that were recently invented in this paper by Rashmi et al. for repair, achieves the secrecy capacity.

Interestingly, the same separation is no longer optimal for other points of the storage-repair tradeoff and as far as I understood the secrecy capacity remains open.

Codes for Distributed Storage: The repair problem

Posted on June 21, 2010 by Alex Dimakis

The first two talks in this session

Explicit and Optimal Exact-Regenerating Codes for the Minimum-Bandwidth Point in Distributed Storage by K. V. Rashmi , Nihar B. Shah , P. Vijay Kumar , Kannan Ramchandran

and

A Flexible Class of Regenerating Codes for Distributed Storage by Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar

where given by Vijay Kumar. To summarize the results in these two papers (and also

Exact-Repair MDS Codes for Distributed Storage Using Interference Alignment

by Changho Suh and Kannan Ramchandran

that appeared in another session), it is useful to briefly explain the current landscape of the repair problem:

(for more information see the Distributed storage wiki and this survey paper which however do not contain these new results yet.)

Assume a file to be stored, of size $B$ (sometimes also found as $\mathcal{M}$ in the literature) one can use an $(n,k)$ MDS code for storage with maximum reliability: just cut the file into $k$ packets of size $B/k$ , encode into $n$ pieces (of the same size $\alpha=B/k$ ) and store each packet into a storage node. The reliability achieved is that any $k$ storage nodes contain enough information to recover the file, which is the best possible. So we can tolerate any $n-k$ failures without losing data. But most of the time we only have one failure. We then want to repair this failure by creating a new encoded packet by communicating as little information as possible from $d$ surviving storage nodes. There are two cases:

1. If we ask that the new packet still forms a good code (i.e. any $k$ out of the new $n$ nodes suffice to recover), the problem is called functional repair and is completely solved and equivalent to multicasting. The minimum repair bandwidth can be determined by a cut-set bound and we will refer to the achievable region as the cut-set region.

2. If we ask that the new packet formed is exactly identical to the lost one: This problem is called exact repair and is very difficult. It is equivalent to a multicasting network coding problem with intermediate nodes having requests for subsets and therefore prior tools from network coding do not suffice. The repair problem seems close to locally decodable codes but this connection has not been investigated yet.

The amount of information communicated from $d$ surviving nodes is called the repair bandwidth $\gamma=d\beta$ . Another issue is that one can reduce this repair bandwidth if the storage per node $\alpha$ is allowed to increase and there is a tradeoff involved.

The two most important points of this tradeoff are called Minimum Storage point (MSR) (where $\alpha=B/k$ corresponds to MDS codes) and Minimum Bandwidth point (MBR) (where $\alpha$ is larger but the repair bandwidth is minimized).

Clearly exact repair is a strictly harder than functional repair and the cut-set region is still a valid outer bound. A substantial amount of recent work has been focused on understanding the communication required for exact repair. All the results here establish that the cut-set region is achievable for exact repair for the parameters discussed.

Vijay presented a storyline that concluded to the settings which were known before.

This paper had completely solved the problem for the Minimum Bandwidth (MBR) point, for any $(n,k)$ , but fixing $d=n-1$ .

The first paper (Explicit and Optimal Exact-Regenerating Codes for the Minimum-Bandwidth Point in Distributed Storage) in this session established that the MBR cut-set bound point can be achieved with exact repair for any $d$ , corresponding to an arbitrary number $d\leq n-k$ of failed nodes. The second result in this paper is that all the other interior points of the tradeoff curve (except the points of minimum repair and minimum storage) cannot be achieved with linear codes for exact repair. ( Of course this does not preclude the possibility that they can be approached infinitely close by making the code packets smaller and smaller).

In the second paper A Flexible Class of Regenerating Codes for Distributed Storage, the authors generalize the requirements of the distributed storage system: the initial requirement was that any data collector who connects to any $k$ storage nodes (and receives all their stored data ) obtains enough information to recover the file. Here the authors want to enable data collectors who have limited communication with each storage node to reconstruct the file. The only requirement is that the total amount of information communicated is sufficient. This is, of course, a harder requirement on the storage code and the authors characterize the repair bandwidth for functional repair under this new requirement of flexible recovery. The technical part involves graph theoretically bounding the flow for these flexible information flow graphs. The question of exact repair of these flexible regenerating codes naturally arises and it seems to me that previous exact regenerating code constructions do not immediately generalize.

Finally, in another session was the closely related paper

Exact-Repair MDS Codes for Distributed Storage Using Interference Alignment

by Changho Suh and Kannan Ramchandran

Changho presented two new results: both are for the minimum storage point (MSR) and hence correspond to results for MDS codes.

One is that for the low-rate regime $k/n \leq 1/2$ if the number of nodes helping for the repair is sufficiently large $d \geq 2k-1$ , the MSR cut-set point can be achived for exact repair. For this regime the paper presents explicit code constructions that use a small finite fields and build on this paper which established similar results but only when repairing systematic nodes (and not nodes storing parities).

The second result of this paper closes the open problem of MSR exact repair for the high-rate $k/n\geq 1/2$ regime. The paper uses the symbol extension technique introduced in the Cadambe-Jafar paper Interference Alignment and the Degrees of Freedom for the K User Interference Channel to MDS codes that asymptotically approach the cut-set bound.

This paper appeared simultaneously with

Distributed Data Storage with Minimum Storage Regenerating Codes – Exact and Functional Repair are Asymptotically Equally Efficient, by Viveck R. Cadambe, Syed A. Jafar, Hamed Maleki,

Which independently obtain very similar results using the same technique. Note that the symbol extension approach requires vector linear network coding (sub-packetization of each stored packet) and to approach the cut-set bound an enormous amount of subpacketization (and field size) seem to be required. I think it is correct to interpret these results as existence proofs of high-rate regenerating codes approaching the cut-set bound for exact repair, rather than explicit constructions.

I think that the applicability of interference alignment to problems that seem to have nothing to do with wireless channels is quite remarkable. Another related case is the paper

Network Coding for Multiple Unicasts: An Interference Alignment Approach

by A. Das, S. Vishwanath, S. Jafar, A. Markopoulou.

I will blog my notes on this paper (hopefully) soon. Are the slides available anywhere? I took pictures of slides from other talks but my camera run out of battery.

ISIT 2010: Bethe Entropy and the Edge Zeta Function

Posted on June 19, 2010 by Alex Dimakis

Connecting the Bethe Entropy and the Edge Zeta Function of a Cycle Code

by Pascal Vontobel

Pascal talked about a connection of the Bethe entropy to the edge zeta function of a cycle code, building on previous work by Koetter, Li Vontobel and Walker. Starting with the first quantity, as is well known, Yedidia, Freeman and Weiss established that sum-product (if it converges) can only converge to fixed points of the Bethe free energy. The Bethe entropy is the non-linear term added in the ‘average energy’ (optimized by the LP decoder) and is therefore a fundamental quantity to understand sum-product and its connections to LP relaxations. A cycle code is a linear code where all the variables have degree two (and can be represented as edges).

Now what is the second quantity in this connection, the edge zeta function of a cycle code? As far as I understood (and this is probably inaccurate), for each edge (=variable of the code) define an indeterminate quantity $U_{e}$ .

For a primitive cycle $c_1$ (without allowing backtracking) of this graph, consisting of edges $e_1, e_2, \cdots$ , define a monomial $\gamma(c_1)$ that is the product of terms $\gamma(c_1)= U_{e1}U_{e2}\cdots$ for all the edges of the cycle.

The edge zeta function is a product over all the cycles of these monomials inverted: $\prod_{\forall c_i} \frac{1}{1-\gamma(c_i)}$ .

The edge function can therefore be evaluated at every point of the fundamental polytope by setting the $U_{e}$ appropriately. Previous work by Koetter, Li Vontobel and Walker established that each monomial in the Taylor expansion of this function corresponds to an (unscaled) pseudo-codeword of the cycle code. In particular the powers appearing for each edge (=variable of the code) correspond to re-scaled pseudo-codeword coordinates. As far as I understood, this paper shows that the coefficients of the monomials associated with a given pseudo-codeword have a meaning: their asymptotic growth rate equals to the directional derivative of the (induced) Bethe entropy in the direction of that pseudo-codeword.

I was wondering on the connections to the self-avoiding walks of Dror Weitz [Counting independent sets up to the tree threshold] (also Local approximate inference algorithms) and the connections of the random walks to the weights of the Arora,Daskalakis, Steurer Message Passing algorithms and improved LP decoding paper.

	Zonghong Liu on A story about Canvas
	anonymousskimmer on “The needs of the many,…
	Chanterelle Recipes… on Broiled shrimp with chanterell…
	kvarsh on ICML 2019 encouraged code subm…
	Pulkit Grover on gender inclusivity in communic…

An Ergodic Walk

a process whose average over time converges to the true average

Author Archives: Alex Dimakis