ISIT 2007 : the plenaries

I went to all the plenaries this year except Vince Poor’s (because I was still sick and trying to get over it). I was a bit surprised this year that the first three speakers were all from within the IT community — I had been used to the plenaries giving an outsider’s perspective on problems related to information theory. The one speaker from outside, Emery Brown, was asked by Andrea Goldsmith “what can information theorists do to help your field?” I learned a lot from the talks this year though, and the familiarity of the material made it a more gentle introduction to the day for my sleepy brain.

Michelle Effros talked about Network Source Coding, asking three questions : what is a network source code, why does the network matter, and what if separation fails? She emphasized that in the latter case, bits and data rate are still useful abstractions for doing negotiations with higher layers of the network stack. For me, this brought up an interesting question — are there other currencies that may also make sense for wireless applications, for example? Her major proposal for moving forward was to “question the questions.” Instead of asking “what is the capacity region” we can ask “is the given rate tuple supportable?” She also emphasized the importance of creating a hierarchy of problem reductions (as in complexity theory). To me, that is already happening, so it didn’t seem like news, but maybe the idea was to formalize it a bit more (c.f. The Complexity Zoo). The other proposal, also related to CS theory, was to come up with epsilon-approximation arguments. The reason this might be useful is that it is hard in general to implement “optimal” code constructions, and she gave an example from her own work of finding a (1 + epsilon)-approximation for vector quantization.

Shlomo Shamai talked about the Gaussian broadcast channel, discussing first the basics, degradation, and superposition, and then how fading makes the whole problem much more difficult. He apologized in advance for supposedly providing an idiosyncratic look at the problem, but I thought it was an excellent survey. Although he pointed out a number of important open problems in broadcasting (how can we show Gaussian inputs are optimal?), I was hoping he would make more of a call to arms or something. Of course, on Tuesday I was heavily feverish and miserable, so I most likely missed a major point in his talk.

As I said, I missed Vince Poor’s talk, but I think I saw a version of the same material at the Başarfest, and it used game theory methods to study resource allocation for networks (like power allocation). I needed the extra rest to get healthy, though.

Sergio Verdú gave the Shannon Lecture, and titled his talk “teaching it.” He made a few new proposals for how information theory should be taught and presented. One of the strengths he identified was the number of good textbooks which have come out about Information Theory, many of which were written by previous Shannon lecturers. If I had to sum up the main changes he proposed, it was to de-emphasize the asymptotic equipartition property (AEP), separate fixed-blocklength analysis from asymptotics, don’t present only memoryless sources and channels, and provide more operational characterizations of information quantities. He drew on a number of examples of simplified proofs, ways of getting bit-error converses without Fano’s Inequality, the importance of the KL-divergence, the theory of output statistics, and the usefulness of the normal approximation in information theory. I agreed with a lot of his statements from a “beauty of the subject” point of view, but I don’t know that it would make information theory more accessible to others, necessarily.

Emery Brown gave the only non-information theory talk: he addressed how to design signal processing algorithms to decipher brain function. Starting from point processes and their discretization to get spike functions and rate functions, he gave specific examples of trying to figure out how to track “place memory” neurons in the rat hippocampus. The hippocampus contains neurons that remember places (and also orientation) — after walking through some example of mazes in which a neuron would fire only going one direction down a particular segment of the maze, he showed an experiment of a rat running around a circular area and tried to use the measurements on a set of neurons to infer the position of the mouse. Standard “revcor” analysis just does a least squares regression of spikes onto position to get a model, but he used a state-space model to address the dynamics and some Bayesian inference methods to get significantly better performance. He then turned to how we should model the receptive field of a neuron evoloving over time as a rat learns a task. When asked about what information theorists can bring to the table, he said that getting into a wet lab and working directly with experimentalists and bringing a more principled use of information measures in neuroscience data analysis would be great contributions. It was a really interesting talk, and I wish the crowd had been larger for it.

ISIT 2007 : multi-user IT and cognitive radio

A Linear Interference Network with Local Side-Information (M. Wigger, S. Shamai (Shitz), and A. Lapidoth) : This model was a chain ofinterference channels with K total transmit-receive pairs in a kind of ladder, and looked at behavior of the prelog in the sum-capacity expression when each transmitter knows the J earlier messages in the ladder. For the achievable scheme, they silence some of the sensors and use dirty paper coding, and for the converse they use some interference-cancellation arguments. The bounds match and they get floor(K/(J+2)) for the prelog.

A Broadcast Approach to Multiple Access with Random States (P. Minero and D. Tse) : This paper looks at a kind of “compound MAC” problem, when the receiver has channel information (about the fading state for example). If the encoded information is “layered” via superposition coding, the decoder can opportunistically extract the data at a rate that the channel can support. By making each setting of the states into one virtual receiver, we get a broadcast version of the MAC. They look at two problems — the slow fading compound channel problem, and the “random access” model, where the number of users is variable. For the fading system, superposition is optimal for the sum rate, but for random access it is not.

On InterFerence Channel with Generalized Feedback (IFC-GF) (D. Tuninetti) : This looks at an interference channel with two additional outputs that are fed back to the transmitters. These outputs could be some internal part of the channel, or could be a noisy version of the channel outputs. For a Gaussian setting with the feedback being independently faded and noisy copies of the other transmitter’s signal, she exhibited a block Markov encoding scheme using backward decoding and showed that if there a common message can get a power boost.

Bounds on the capacity region of a class of interference channels (I. Telatar and D. Tse) : Although this talk was the last talk of the conference and I was pretty exhausted, it was one of my favorites. The class of channels being references are those in which the interference for user 1 is user 2’s signal passed through a channel and then a deterministic function. For example the channel could be “fade and add noise.” They derive outer bounds that are quite similar in form to the inner bound due to Chong-Motani-Garg, and can give some bounds on the tightness of that achievable region in the flavor of the “within 1-bit” result of Etkin-Tse-Wang.

ISIT 2007 : security, wiretapping, and jamming

Fingerprinting Capacity Under the Marking Assumption (N. Prasanth Anthapadmanabhan, Alexander Barg, and Ilya Dumer) : This is a cool problem that was new to me and uses some AVC-like techniques so I was quite excited about it. Suppose we have an original version of some data. We would like to make many copies with fingerprints and distribute them such that if any t recipients collude (the “pirates”), they cannot make a fake new copy that will fool a validating agent. The marking assumption is that the pirates cannot change the fingerprint except in places where their copies differ. This is a tough combinatorial problem but is amenable to some random coding arguments. In particular, they get upper and lower bounds on the case of 2 or 3 pirates for binary alphabets.

On Multiterminal Secrecy Capacities (Imre Csiszár and Prakash Narayan) : I talked about some of this work from ITA — this was quite similar, but since it was my second time seeing some of the material, I got a bit more out of it. One basic problem from this paper is this : a public “server” terminal broadcasts a message to a bunch of receivers, who then engage in public discussion. How large a key can they generate, given that the public communication is subject to eavesdropping? The most interesting technical detail for me was the use of Markov trees for the case where some of the receivers are also wiretapped.

On Oblivious Transfer Capacity (Imre Csiszár and Rudolf Ahlswede) : This talk was the only one I saw given on transparencies — Csiszár apologized for giving a talk on “old technology” but noted that the “topic is quite new.” The Oblivious Transfer problem is this : Alice has two strings, X and Y, each of which is k bits long. Bob has a binary variable Z which tells him in which string he is interested (0 for X and 1 for Y). Bob wants to get his string, but doesn’t want to let Alice know which one he wanted, and Alice wants to tell him the correct string without revealing anything about the one he didn’t want. Alice can communicate over a DMC and there is also noiseless two-way communication. The capacity is defined as the limit of k/n, where k is the maximum achievable k. They get bounds on the OT capacity, which in some cases are tight, but under the assumption that Alice and Bob do not try to “cheat.”

An Achievable Region of Gaussian Wiretap Channel with Side Information (Yanling Chen and A.J. Han Vinck) : This paper looks at the wiretapping problem where an additive channel interference term is known at the encoder. In this setting, a generalized Costa “dirty paper” coding scheme can be shown to mask the source signal from a wiretapper (giving high equivocation) while communicating over the channel as if the interference were not present. They conjecture that this scheme is optimal, but as usual in more complicated coding scenarios, the converses are a little harder to come by.

Shannon’s Secrecy System With Informed Receivers and its Application to Systematic Coding for Wiretapped Channels (Neri Merhav) : This studies the case where an iid source vector to be encoded at the transmitter is observed in two ways by the decoder — firstly via an iid correlated sequence of variables, and secondly via an encoded message sent via the encoder over a DMC. The wiretapper gets a degraded version of both of these variables. One key message from the talk was that systematic coding is sometimes as good as the “optimal” scheme in an equivocation sense.

Smirnov’s Theorem : dependence flowchart

This last semester we had a reading group on percolation theory, using the new book by Bollobas and Riordan. The crowning moment of our discussions was a 3-week trek through the proof of Smirnov’s theorem, which shows the conformal invariance of crossing probabilities for the triangular lattice. The book apparently contains the first “complete” proof in print. It’s quite involved, and for my part of the presentation I made the following flowchart of the structure of the proof:

Smirnov’s Theorem Flowchart

ISIT 2007 : large random networks

Scaling Bounds for Function Computation over Large Networks (Sundar Subramanian, Piyush Gupta, and Sanjay Shakkottai) : This paper essentially looked at the scaling laws for the “refresh rate” of networks in which a collector node wants to compute a function f(x1, x1, …, xn) of measurements at the nodes of a network. The network has bounded degree and there is a difference in scaling between type-sensitive (mean, mode) and type-threshold (max, min) functions. They show that deterministic bounds on type-sensitive functions are not possible in general, but probabilistic bounds are possible. By using a joint source-channel coding strategy, for AWGN networks they obtain constant refresh rates for small path-loss exponents.

Characterization of the Critical Density for Percolation in Random Geometric Graphs (Zhenning Kong and Edmund M. Yeh) : Since we had a reading group on percolation theory this semester, this talk felt right up my alley. Although using Monte Carlo techniques we know the critical threshold (density) for percolation (formation of a giant connected component) to happen in random geometric graphs, the analytical bounds are quite loose. This paper gets tighter analytical bounds by doing some smarter bounding of the “cluster coefficients,” which come from looking at the geometry of the percolation model.

ISIT 2007 : feedback

Communication with Feedback via Posterior Matching (Ofer Shayevitz and Meir Feder) : This work was an attempt to come up with a common framework and generalization of the Horstein and Schalkwijk-Kailath feedback coding schemes, in which the encoder uses the feedback to track the decoder and “steer” it to the correct message. They come up with an idea, called “posterior matching” and apply it to DMCs to show that a simple “steering” encoder can also achieve the empirical mutual information of the channel I(Q,W) using the posterior CDF at the decoder. It’s a pretty cool result, in the line of “A is like B in this way,”

Broadcasting with Feedback (Sibi Raj Bhaskaran) : This addresses the question of broadcasting when feedback is only available to one user. In a degraded Gaussian broadcast setting, if the feedback is from the strong user, you can use a Gaussian codebook for the weak user with a Schalkwijk-Kailath scheme for the strong user to get an increased capacity region. This is one of those papers that I’ll have to read a bit more carefully…

The Gaussian Channel with Noisy Feedback (Young-Han Kim, Amos Lapidoth, and Tsachy Weissman) : This talk was about error exponents for the AWGN channel with noisy feedback. By using some change-of-measure and genie arguments, they show a non-trivial upper bound, and a three-phase coding scheme can give a lower bound which scales like (feedback noise variance)-1. Unlike the perfect feedback case, where the exponent is infinite, both bound are finite. Furthermore, they show that linear coding for noisy feedback will not get any rate.

Error Exponent for Gaussian Channels with Partial Sequential Feedback (Manish Agarwal, Dongning Guo, and Michael L. Honig) : In this talk, they take an AWGN channel in which only a fraction of the received symbols are fed back and ask for the error exponent. They propose an achievable scheme that uses the slots with feedback separately from the slots without. With feedback, they try to steer the decoder into a favorable position. and then use a forward error-correcting code to get the rest of the rate.

A Coding Theorem for a Class of Stationary Channels with Feedback (Young-Han Kim) : The class of channels under consideration are those in which there is order m memory, and in this case the capacity is given by a normalized directed mutual information. The main interest for me was in the proof strategy, which used Shannon strategies and something called the “Nedoma decomposition,” which I have to read about a bit more… except that it’s in German, so I have to brush up my German first.

Capacity of Markov Channels with Partial State Feedback (Serdar Yüksel and Sekhar Tatikonda) : This was about partial feedback in that the state is quantized and sent back to the encoder. For a Markov state transition with some technical mixing conditions, they find a single-letter expression for the capacity with feedback, and that a sliding-window encoder is “almost good enough,” which is good news for practical coding schemes. They also use some nice dynamic programming with the directed mutual information as the running costs, which merits some closer inspection, I think.

ISIT 2007 : source coding

Source Coding with Distortion through Graph Coloring (Vishal Doshi, Devavrat Shah, and Muriel Médard) : This paper looked at the rate-distortion function for reconstructing a function f(X,Y) with X at the encoder and Y as side information at the decoder. One can think of this as a “functional” Wyner-Ziv, or in some cases as a noisy Wyner-Ziv. In the lossless setting, the reconstruction function can be found using graph entropy, and minimum-entropy graph coloring turns out to be a way of obtaining a solution. For the lossy problem, they find a similar graph coloring scheme but can’t get a single-letter expression in all cases. What is interesting to me is the optimality of “preprocess + Slepian-Wolf,” which is similar in spirit to the work done by Wang, Wagner, Viswanath and others for Gaussian multiterminal source coding problems.

Compound conditional source coding, Slepian-Wolf list decoding, and applications to media coding (Stark C. Draper and Emin Martinian) : The main motivation here was that many multimedia applications (such as video coding) may more fruitfully be thought of as compound source coding problems with an unknown side information at the decoder. In this setting, we can imagine the side information as one of P different predictors of the source X available at the encoder. The encoder can use the different predictors to list decode the source message and send conditional list-disambiguation information in addition to the source encoding. It’s a neat scheme that seems quite related to some of my thesis work on list decoding for AVCs.

Correlated Sources over a Noisy Channel: Cooperation in the Wideband Limit (Chulhan Lee and Sriram Vishwanath) : I have to admit I didn’t fully get this talk, but it looked at wideband distributed source coding using “highly correlated sources.” They propose a modified PPM scheme which can exploit the correlation as if the encoding is joint with small bit error rate (but possibly larger block-error rate). What was unclear to me was why the modified error criterion was necessary, but it seems to be an artifact of the proposed scheme. The algorithm requires a sliding window decoder whose analysis seems a bit tricky.

Joint Universal Lossy Coding and Identification of Stationary Mixing Sources (Maxim Raginsky) : What is the loss in estimating the parameter of a source and doing universal lossy source coding? By using a competitive optimality framework and a Lagrangian formulation to trade off the parameter error and source distortion, Raginsky can bound the loss in performance. This falls into the category of the O(log n/sqrt(n)) results that I don’t know much about, but I will probably take a look at the full paper to get a better idea of how the codes work. He uses some ideas of VC dimension from learning theory, which I know a little about, so hopefully it will not be too hard going…

The source coding game with a cheating switcher (Hari Palaiyanur, Cheng Chang, Anant Sahai) : This is an extension of Berger’s source coding game, in which a switcher switches between k memoryless sources and you have to make a rate distortion code that can handle the worst case behavior of the switcher. Before, the switcher could not see the source outputs first. Here, he can (hence “cheating”). The main point is to figure out which iid distributions the switcher can emulate, and the worst one in that set gives the bound. The rest is a union bound over types and doesn’t affect the rate.

How much detail should a review version have?

One pesky problem that seems to pop up when I write or review papers is the “minor algebra error due to space constraints.” You have some theorem and then you go back and redefine x to be 2 y instead of y/2 and then suddenly stuff is off by a factor of 4 everywhere, but that doesn’t matter for the result per se — it’s just some constant floating around. This is of course an issue with conference papers, since they have strict page limits, and you end up shortening proofs to sketches, but it also happens while revising journal papers. One of the jobs of the reviewer is to check that the algebra works out, which becomes tedious if all the algebra is in fact correct but the paper skips 5 lines of simplifications and you have to go and work it out yourself.

Which brings me to the modest proposal : when submitting a journal paper, put in all the algebra, with a little footnote saying that you’ll omit the intermediate steps in the final version. This way, the checking for correctness becomes almost mechanical. Sure, it may make the submitted manuscript look bloated, but then the time saved can be spent on checking the structure of the argument. As an added benefit, the writer will be forced to explore the full ramifications of changing the notation around. Of course, this wouldn’t be possible (probably) for conference papers, but would it help for the journal process?

resizing delimiters in LaTeX with line breaks

IEEE uses a two-column format that is a bit narrow for large formulae, and it makes parenthesis resizing a pain when you have to break lines, because LaTeX (apparently) will not match parenthesis sizes across lines. For example, consider

\mathbb{P}\left(\frac{1}{N}\sum_{i=1}^{N}\mathbf{1}(\mathbf{y}(Tc\in{D(Z_{i,c})})>G\right)

So if you have a \exp \left( followed by some tall expression, like - \sum_{i=1}^{n} \frac{1}{2^i} \int_{\mathbb{R}} \langle f_i(t), g(t) \rangle dt + \prod_{i=1}^{n} f_i(0) - \lim_{x \to \infty} \frac{g(t)}{2 \pi} you start to run into problems fitting the whole thing on the line so that the corresponding \right) fits within the page margin. Furthermore, if the equation has multiple opening brackets and different size elements, the opening and closing brackets may not match in size when you break the line.

My old hack for this was to manually resize the \left( by using \Big\left( or something like that, putting empty \right. commands before the line break, and then starting the next line with empty \left. commands. If you have multiple opening and closing brackets you have to futz around, putting a \Big or \Bigger around each delimiter to make it fit, but a (somewhat) easier hack is to insert a tall whitespace like this:

\\rule{0pt}{15pt} \\right. \\right. \\nonumber \\\\
&
\\left. \\left. \\rule{0pt}{15pt}

This isn’t too great a savings, since I now have to resize 2 things instead of 4, but it’s something at least, and the delimiters end up the same size. I could probably write a macro to do this, but that seems like a waste of time.