*On Optimal Information Storage in Synapses*

(Lav R. Varshney and Dmitri B. Chklovskii)

Lav presented a mathematical model for storing information in synapses that could be modeled as a communication channel with input costs, and then calculated the capacity per unit cost to derive some principles for a good memory design. These principles are consistent with biological evidence about the brain, which shows that reality is consistent with optimizing efficiency in this model (but not, naturally, that nature is optimizing this model). The model is that we are coding over a set of*n*synapses in space. The input alphabet*X*is the synaptic strength, so a codeword is a sequence of synaptic strengths for the*n*synapses. The cost function*b(x)*for a particular synaptic strength measures the volume of a synapse, so that stronger synapses are larger. By using an SNR model for the channel, they can calculate this capacity per unit cost. The model can explain some properties such as the optimal exponent of the cost function, the profile of synaptic strengths and the benefits of noisy synapses, and the reasons for sparse connectivity.*Learning from Compressed Observations*

(Maxim Raginsky)

Maxim presented a set-up in which he coupled the “learning from examples” paradigm of machine learning to a rate-distortion problem. In learning from a set of examples*{X*, we are interested in finding a map_{i}, Y_{i}}*f : X -> Y*that can assign new*Y*values to fresh samples*X*. Some targets for making a good classifier are minimizing the empirical risk or the generalization error, and it is the latter criterion on which this work focused. The problem is this : if the*X*samples are available to both the encoder and decoder but the*Y*‘s must be compressed to rate*R*, what is the set of rate-generalization error tradeoff curves. By using some heavier function-analytic machinery he can get an expression for an achievable rate as the distortion-rate function over a set of distortions depending on*Y*.*Intermediate Performance of Rateless Codes*

(Sujay Sanghavi)

Designing a rateless code for erasures like a Fountain or Raptor code corresponds to designing a degree distribution on the graph that defines the codes. In particular, the soliton distribution can be shown to achieve capacity. The question here is the decoder performance when the number of unerased symbols is less than the number needed to decode — can we partially recover? That is, if we need*K*symbols, but only get*r K*symbols, for what*t*can we get*t K*symbols correctly? The soliton distribution performs poorly in this respect — the error is catastrphic. Sujay gets a pointwise outer bound on the performance of any code (via linear programming), and each point corresponds to a different input distribution. The proof uses some embeddings into a fluid model due to Darling and Norris.*On Secure Distributed Source Coding*

(Vinod Prabhakaran and Kannan Ramchandran)

Suppose Alice and Bob each have correlated sources and Alice wants to communicate her source to Bob. Eve has a source of her own correlated with Alice and Bob’s, and can eavesdrop on their communication. They formulate their problem in terms of the leakage rate — the rate at which information leaks to Eve. For one-way communication from Alice to Bob they characterize the leakage rate completely, and show it is achievable using a “secure Slepian-Wolf” coding scheme. For two-way communication they get bounds in terms of the secret key rate, and they also get the leakage capacity for a certain multiple-access setup.*A Hot Channel*

(Tobias Koch, Amos Lapidoth, and Paul P. Sotiriadis)

This paper used a (rather simplified) model for communication on a chip. In this model, the variance of the noise is a linear combination of additional noise and the average power expended by the transmitter in earlier channel steps. This “heating up” property makes the channel a bit tricky to handle. One interesting way to attack it is by looking at the capacity per unit cost function, where the cost is the SNR — this was done in earlier work at ISIT. The main result is a characterization of the capacity itself in terms of the coefficient sequence that dictates how the noise variance is calculated. In particular, if the memory in this noise model decays geometrically, then the capacity is bounded.*A Deterministic Model for Wireless Channels and its Applications*

(David Tse)

David presented a new way of thinking about wireless channels in which the transmitters and receivers only transmit bits. The goal is to do away with the noise and instead focus on how different user’s signals interact/interfere with each other. We think of each bit as occupying a signal level, so that there is an ordering from MSB to LSB, where the MSB is occupying a “high power level.” This is motivated by the approximation*log(1 + SNR) = log(SNR)*at high SNR. The SNR determines how many bits that were transmitted can be received at the decoder. So in a broadcast setting, the transmitter may send 5 bits, one receiver will get all 5, and the other will only get the 3 most significant bits (MSBs). Interference is modeled as XORing the bits. He stepped through several examples of building deterministic models, and then addressed how close the rates derived from these models are to the optimal rate, along the lines of the “within one bit” results on the interference channel. There will be a number of talks at Allerton on how to use this model for different channels, including general relay networks, MIMO channels, and the interference channel.

# Monthly Archives: September 2007

# not really a spoiler

This month’s *Harper’s* cryptic is certainly expanding my vocabulary.

# correction to “Channels with nosy ‘noise’”

I have posted a corrected version of my ISIT paper on randomized coding for channels whose state can depend non-causally on the transmitted codeword. It turns out the result as presented there was incorrect. Perhaps if I get a little more time I’ll write something expository on my little nook of information theory.

# office safety

When running a fast optimization tool such as CVX, it is important to wear a helmet to protect you in case of crashes or attacks by monkeys:

# ITW 2007

**Plenary**:Pseudo-codewords and iterative decoding: A Guided tour.

by Pascal Vontobel.

Pascal gave a very nice tutorial presentation of the recent results on Linear programming decoding, its relations to iterative decoding through graph covers (the eagle vs the mountain lion) and the role of the vertices of the fundamental polytope called pseudo-codewords. I saw that the slides are available online on the pseudo-codewords homepage and contain many interesting figures like the decision regions and numerous graph cover examples.

On the Minimum Number of Transmissions in Single-Hop Wireless Coding Networks

by Salim Y. El Rouayheb, Mohammad Asad R. Chaudhry, and Alex Sprintson

This paper studies the ‘Single-hop’ wireless Network coding problem where a base station is trying to communicate a number of packets to a set of receivers. Each of the receivers has (say, from previous transmissions) a subset of the packets, and a list of desired packets it wants to receive. The question is finding the minimal number of transmissions for the base station (each transmission is overheard by all the receivers) so that all the clients can decode the packets they are interested in. For example, if receiver R1 has packets A and wants B while R2 has B and wants A, a single transmission of A+B will suffice. Salim showed that finding the minimal number of transmissions for a general set of receivers is NP-hard over GF(2). Further, the minimal number of transmissions is non-monotone in the field size and can depend on the characteristic of the field (which was quite surprising to me- I was guessing that there would exist a field size large enough above which there would be no difference). They also show how a special case of this problem (assuming no memory at the decoders) boils down to clique partition and propose a simple algorithm that works well for random instances.

Fault Tolerant Memories Based on Expander Graphs

by Shashi Kiran Chilappagari and Bane Vasic

This paper looked at the problem of maintaining an erasure encoded representation when both the memory elements and the logical gates can be faulty. As far as I understood, there is a correcting circuit that periodically checks for faults in the memory and repairs them using the bit-flipping algorithm of Sipser and Spielman. The problem is that this circuit is made of unreliable logic gates and hence the repairs are not always correct. If the Tanner graph of the code used has sufficient expansion, then the correcting circuit manages (despite its own errors) to keep the fraction of errors below a threshold.

This threshold is low enough for a powerful decoder (that makes no errors) to be able to recover all the bits at any given time- whereas a memory with no correction circuit would always fail after a constant amount of time.

Reducing the Error Floor

Michael Chertkov

Misha presented improved decoding algorithms (based on LP decoding) to reduce the probability of error at high SNR (error floor). The talk focused on the famous [155,64,20] Tanner code where the authors have developed an algorithm to find the most dangerous (low pseudo-weight) pseudo-codewords. The proposed Loop Guided Guessing (LGG) is an informed facet guessing algorithm that selects which bits to guess by finding critical loops on the graph. It was great to learn that the investigated algorithms managed to correct all 200 low weight pseudo-codewords-essentially achieving ML performance for high SNR.

On Optimizing XOR-Based Codes for Fault-Tolerant Storage Applications

Cheng Huang, Jin Li, and Minghua Chen

This paper shows a novel technique to improve the performance of encoding and decoding for linear codes that have been reduced to binary representations. The encoding and decoding of linear block codes is represented in matrix form, where the matrices are fixed while the vectors are data dependent. The idea is finding patterns in the matrices that can be reused, essentially optimizing the number of XOR operations required to multiply any vector with a given matrix. For example is the parities P1= x1+x3+x5, P2=x1+x3+x5+x6, P6=x2+x3+x4+x5 are computed naively, this requires 8 XOR operations. If one pre-computes x3+x5, only 6 XOR operations suffice. The authors call this common operations first (COF) principle and propose an optimization problem to minimize the number of XORs required to multiply a given matrix with any vector. They propose two greedy algorithms that work well in practice and show that generic Reed-Solomon codes that have been optimized with this scheme can be equally or more efficient than codes specifically designed to minimize the number of XOR operations.

Approximate message-passing inference algorithm

Kyomin Jung and Devavrat Shah

Devavrat presented a general message passing framework for computing maximum a-posteriori assignments (MAP) for binary markov random fields. Building on the recent result by Dror Weitz on the equivalence of message passing on a graph and a self-avoiding walk tree of G for marginalization, the equivalence still holds for MAP assignments. For graphs that admit some kind of good partitioning, Devavrat presented message passing algorithms that can approximate the MAP assignment within epsilon factor in polynomial time for any fixed epsilon. The intuition is that the graph can be separated in small components, compute estimates locally and then combine them to obtain a good global estimate. The algorithm can be applied to a wide family of graphs that can be easily separated (examples included planar graphs and graphs that have some types of geometry).

Equivalence of LP Relaxation and Max-Product for Weighted Matching in General Graphs, Sujay Sanghavi

Sujay showed that the max-product algorithm for finding weighted matchings in general graphs converges to the right answer if and only if the LP relaxation of the problem is tight. The proof relies on the fact that max-product is solving the problem on a computation tree that can sometimes contain extra matchings (pseudo-matchings!) of larger weight, that do not exist on the real graph. Using LP duality, Sujay showed that if the LP relaxation is tight, there are no such evil matchings on the computation tree. The converse (as far as i understood) is using a combinatorial characterization of when the LP relaxation fails- the existence of evil structures in the graph (‘Bad stemmed blossom’ and ‘bad blossom pair’) that also make max-product fail.

On the Hardness of Approximating Stopping and Trapping Sets in LDPC Codes

Andrew McGregor and Olgica Milenkovic

Olgica presented hardness results for minimum distance, minimum stopping and trapping sets for general linear codes and even LDPC codes. In particular she showed that minimum stopping and trapping sets are hard to approximate within an arbitrary constant factor (in other words, there exists a factor c below which c-factor approximations become NP-hard). The reductions carry through even if the codes have constant degrees (LDPC) with minimum degree 3. An interesting related open problem I was thinking about after this talk is hardness results on finding and approximating minimum (BSC) pseudo-weight. Any ideas on this?

# SF Fringe : The Sewers

On Tuesday R and I saw The Sewers, a production from the New York theater company Banana Bag and Bodice. The play was part of the SF Fringe Festival, and is running through this weekend. I really enjoyed this play — the visual elements, disjointed narrative, and intense performances were exactly the kind of theater I’ve been itching to see:

This show is a conjuring act; an entire, albeit, tiny village by the name of The Sewers mysteriously appears one night in the theatre. All the children are dead. An acid plant in a barn. A triangular shaped love tryst. This show is a tour de force by manipulation.

If you are a fan of Richard Foreman, you will like it for sure, I think — it was done at the Ontological Theater in NY. The best thing about it is that it affects you in a way that is hard to articulate, and the process of trying to express to yourself what you think about it is fodder for hours of thought. Go see it!

# ITW 2007 (Tahoe) : Day Two

**Plenary Talk:***Source and Channel Coding: Theory, Practice and System Aspects*(Giuseppe Caire)This focus of this talk was on the benefits of using joint source channel coding for more “practical” systems. The main motivation was the catastrophic behavior of excess channel errors on source reconstruction. If you optimally compress an image and send it over a noisy channel, the reconstruction becomes terrible if the channel is a bit noisier than you expected. Since wireless channel variations are a real problem, this can have a real impact. Can we have a graceful degradation in the reconstruction quality via JSSC? He also gave two more examples, one from beamforming on the MIMO broadcast channel, and one about multicasting a Gaussian source over a Gaussian MAC using a Costa-like layering construction.

*The Case for Structured Random Codes in Network Communication Theorems*(Bobak Nazer and Michael Gastpar)This paper focused on giving some motivations for exploiting a structured approach to random codes, rather than the standard information theoretic constructions we all know and love. For problems such as computing over a multiple-access channel, linear network coding, or interference networks, separating source and channel coding can result in performance losses. A simple scheme based on lattices can get better performance in these cases.

*On Separation in the Presence of Feedback*(Haim Permuter and Tsachy Weissman)It is well-known that separation is optimal for point-to-point channels and that the directed mutual information gives the capacity of channels with feedback. For ergodic channels, they show that for construction with distortion

*D*the condition*C*is necessary and sufficient, which shows that separation is optimal in this setting as well. For certain MACs which can be decomposed into a “multiplexer plus point-to-point” channel, they can show a similar result. In the talk they also looked at a binary MAC with stationary ergodic noise and feedback, and showed that separation is optimal there. Their results extend to mod-additive MACs._{FB}> R(D)*Equivalence of LP Relaxation and Max-Product for Weighted Matching in General Graphs*(Sujay Sanghavi)The problem of maximum weight matching in a graph with edge weights is to find the largest weight set of disjoint edges. One algorithm for doing this is max-product, which is an iterative message passing algorithm. The matching problem can also be posed as an integer program, which has a linear programming relaxation. Sujay shows that the performance of max-product is tied to the tightness of the LP — if the LP is tight then MP converges to the correct answer, and if the LP is loose then MP will not converge. This provides a nice guarantee of when MP will converge that ties together the iterative and linear-programming approaches to optimization on graphs. Clearly this has relations to decoding, but I’ll let Alex comment on those.

*On the Duality and Difference Between Slepian-Wolf Coding and Channel Coding*(Jun Chen, Da-ke He, Ashish Jagmohan, and Luis A. Lastras-Montano)This paper explored the similarities between channel coding and Slepian-Wolf coding. The comment made by Jun Chen at the talk was that the “difference” mentioned in the title is actually another kind of duality. By duality, they are referring to a kind of mirror pairing of the error exponents for source and channel coding. Using relations between linear coding strategies shows that nonlinear Slepian Wolf codes can

*strictly*outperform linear codes.*On Network Interference Management*(Aleksandar Jovicic, Hua Wang, and Pramod Viswanath)This paper looked at two basic interference patterns — one long range link causing problems for many short range links, and one long range link experiencing interference from many short links. The question is this — is orthogonalization/degree-of-freedom sharing/reuse optimal? They characterize the high SNR degree-of-freedom range for both scenarios using a multi-layer superposition coding scheme and use genie-aided converse arguments to get a “within one bit” characterization in the spirit of the Etkin/Tse/Wang approach to interference channels. In the latter interference model they also show that power control can get similar performance.

*Capacity for Classes of Broadcast Channels with Receiver Side Information*(Gerhard Kramer and Shlomo Shamai (Shitz))The channel model here is a broadcast channel where each user has as side information the other user’s message. The coding schemes used are nested lattice codes, and they show that they can get all capacity points — furthermore they can generalize their scheme to cost-constrained channels and coding for degraded message sets. At the end of the talk there was some very recent work by a student working with Gerhard (Wei Kang, I believe?) who showed a similar result for the case of general correlated sources with a certain kind of degraded side information at one receiver.

*Successive Refinement of Vector Sources under Individual Distortion Criteria*(Jayanth Nayak, Ertem Tuncel, Deniz Gunduz, and Elza Erkip)The problem here is to make a successive refinement code (if you get more of the codeword you can decode to lower distortion) for vector sources where we have a distortion constraint on each of the components of the code. If we look at a two sources (vectors of length 2) we can mark achievable distortion pairs

*(D*on the plane and ask “what kind of trajectory can we make with successive refinement codes? It turns out the answer is rather complicated — for Gaussian sources we get three regions and there are rules about what paths we can take._{1}, D_{2})*On Cognitive Interference Networks*(Amos Lapidoth, Shlomo Shamai (Shitz), and Michele A. Wigger)This is a follow-up to the ISIT paper, which focused on characterizing the pre-log of the sum-capacity of a certain interference network that looked like a chain. The “cognitive” aspect of the problem is that the transmitters know the message of their interfering neighbors. They can use some linear encoding to zero-force their own interference and then a number of results pop out — conditions on when the pre-log for

*K*users can be*K*or*K-1*, necessary and sufficient conditions on the network for achieving*K*, and the fact that pre-log can be noninteger, unlike the case where no side information is available.*Reliable Communication in Networks with Multi-access Interference*(Danail Traskov and Gerhard Kramer)Danail presented an interesting scheme for generating correlated messages in networks (with an eye to network coding, it seemed). As a simple example, a source transmitter sends rate-constrained messages to two relays who then communicate to the destination. This is similar to the Schein-Gallager network but without the broadcast component at the beginning. The source can correlate the messages at the relays, and in some cases they can get a capacity result. More importantly, the scheme scales to more general networks of MACs and has a nice “flow” interpretation.

# ITW 2007 (Tahoe) : Day One

A lot of my comments will be short since in retrospect my notes were quite poor for most of the talks I attended. I attribute it to fatigue… I also skipped out on a number of talks to think over some new problems that had come to mind. I’m going to let Alex blog about some of the coding talks.

**Plenary Talk:***On Modeling Distributions over Large Alphabets*(Alon Orlitsky)Orlitsky started with the non-paradoxical nature of the “birthday paradox.” After querying people in the audience one by one, he finally ended with two people who had the same birthday (9/3), namely Shlomo Shamai and Rüdiger Urbanke. This was all a set-up of course, but I honestly thought Urbanke was just sick of the process and tried to end it. One way of turning the birthday paradox on its head is to ask “if we saw that it took K queries to find two people with the same birthday, how should we estimate the length of the year?” This is precisely the problem of large alphabet size estimation with small sample size. He then went on to tour the techniques of patterns, profiles, and the relation to capture-recapture statistics. I had seen a lot of this work before, but it was nice to see it again and a bit more slowly.

*Power Allocation for Discrete-Input Non-Ergodic Block-Fading Channels*(Khoa D. Nguyen, Albert Guillén i Fàbregas, and Lars K. Rasmussen)This work was about finding the SNR exponent of the outage probability for power allocation strategies over a particular class of fading channels (Nakagami). They differentiated between long-term and short-term power allocation and found a relationship between the short-term optimal scheme and a Singleton bound. The more interesting part is that suboptimal, lower complexity schemes are nearly optimal in this outage-exponent sense.

*Probabilistic Capacity and Optimal Coding for Asynchronous Channel*(Ning Cai, Siu-Wai Ho, and Raymond W. Yeung)This paper looked at time jitter in a continuous time model, and showed that run-length limited (RLL) codes are suboptimal for this kind of asynchrony. When the jitter is bounded in some sense they found 0-error codes, and showed that “interval codes” are better than RLL codes. They used an approximation for the rate loss involving characteristic functions in the case where we demand a bound on the time that the encoder and decoder can be out of synch. They come up with a channel model and capacity formulation and design new capacity achieving codes.

*The Effect of Finite Memory on Throughput of Wireline Packet Networks*(Badri N. Vellambi, Nazanin Rahnavard, and Faramarz Fekri)In a packet erasure network on a line, with finite buffer sizes at the nodes, what are the benefits of network coding? The authors came up with a large Markov chain model and have to find the steady state distribution to get the capacity of the network. Unfortunately, the chain is a bit unwieldy so they come up with approximations for the distribution. Some simulations are also provided.

# Amazing image processing

From SIGGRAPH 2007:

# you say routing, I say routing…

Let’s call the whole thing off. One might say the enemy army was “routed,” but do we ever use the word “routing” in that sense? It sounds wrong to me — does it only appear as a participle?

In the networking context though, people say “routing” either to rhyme with “pouting” or with “tooting.” I’d use the latter for “route 66” but I usually use the former for networking.

In case these linguistic musings bore people, fear not — I will write about other things soon.