ISIT 2010 : Abbas El Gamal and Te Sun Han

I seem to have gotten all behind on wrapping up the ISIT blogging, so the remainder may be more compressed takes on things. This is not in the compressed sensing world, where the signals are sparse and my comments are meant to reconstruct, but more like lossy compression where D \to \sigma^2 (for the Gaussian case).

Abbas El Gamal gave a very nice plenary on “Coding for Noisy Networks” in which he really brought together a lot of different eras and streams of work on network information theory and tried to tie them together in a conceptual framework. There was a nice mix of older and newer results. The thing I liked best about it was that he was very optimistic about making progress in understanding how to communicate in networks from an information-theory perspective, which counteracts the sentiment that I heard that “well, it’s just too messy.”

Te Sun Han gave the Shannon Lecture, of course, and he used his time to give a tutorial on the information spectrum method. I had tried to read the book earlier, and honestly found it a little impenetrable (or rather, I wasn’t sure what I was supposed to use from it). The talk was more like reading the papers — concisely stated, but with a clear line of intuition. I know some people are not a big fan of Shannon Lectures as tutorials, but I think there is also a case to be made that most people are unfamiliar with the information spectrum method. A nice example he gave was to show when the output of an optimal source coder looks “completely random.” Maybe this has been done already, but is there a connection between existing theories of pseudorandomness and the information spectrum method?

ISIT 2010 : talks about adversaries

My thesis work was about a particular adversarial model from Shannon theory (the arbitrarily varying channel), but I am more broadly interested in how adversarial assumptions can be used to capture uncertainty in modeling or add an element of robustness to designs. I’m not wedded to the AVC, so I like going to see other works which try to address these issues. Among the talks I saw at ISIT, these three captured different aspects of these models.

POLYTOPE CODES AGAINST ADVERSARIES IN NETWORKS
(Oliver Kosut; Cornell University, Lang Tong; Cornell University, David Tse; University of California at Berkeley)
This talk used as an example the “cockroach network,” named thusly because it “does not look like a butterfly.” That line got a lot of laughs. In this model a subset of nodes in the network are controlled by an adversary, who can introduce errors. They show that by using a certain structured code called a polytope code, together with some checking and filtering by internal nodes, the adversaries can be detected/avoided. For some networks (planar) they can show that their code achieves the cut-set bound. The key to the codes working is making any adversarial action result in certain tuples of random variables becoming atypical and hence detectable.

JAMMING GAMES FOR POWER CONTROLLED MEDIUM ACCESS WITH DYNAMIC TRAFFIC
(Yalin Sagduyu; Intelligent Automation Inc./University of Maryland, Randall Berry; Northwestern University, Anthony Ephremides; University of Maryland)
This uses adversarial modeling (in the form of game theory) to model how users in a multiaccess setting contend for resources in the presence of jammers. In the AVC context, La and Anantharam proved some early results on these models. Here the extension was that the jammer(s) do not know whether or not the transmitter will be sending anything in a given slot (hence dynamic traffic). In the case with a single transmitter and a single jammer, the model is that the transmitter has to minimize its energy subject to an average rate constraint (packets are coming in at a certain rate to the transmitter and have to passed along). The jammer has a power constraint and wants the transmitter to maximize its energy. It turns out the that if the jammer doesn’t know the queue state of the transmitter, then it has to be on all the time. They have more complicated extensions to multi-transmitter scenarios.

ON THE DETERMINISTIC CODE CAPACITY REGION OF AN ARBITRARILY VARYING MULTIPLE-ACCESS CHANNEL UNDER LIST DECODING
(Sirin Nitinawarat; University of Maryland)
Sirin talked in the session I was in. For point-to-point AVCs under average error, there is a finite list size L called the symmetrizability of the channel such that for list-decoding with list sizes \le L the capacity is 0 and for larger list sizes the capacity is the randomized coding capacity of the AVC. This work extended this to the multiaccess setting, which is a particularly tricky beast since the capacity region itself is nonconvex. He showed that there is a U such that the capacity region has empty interior if the list size is \le U and that for list sizes \ge (U+1)^2 you get back the randomized coding capacity. What is open is whether the list sizes for the two users can be different, so that this gap could be closed, but that problem seems pretty hard to tackle.

CODING AGAINST DELAYED ADVERSARIES
(Bikash Kumar Dey; Indian Institute of Technology, Bombay, Sidharth Jaggi; Chinese University of Hong Kong, Michael Langberg; Open University of Israel, Anand Sarwate; University of California, San Diego)
I guess I’ll be shameless and blog about my own paper — we looked at an adversarial setting where the adversary gets to see the transmitted symbols after a delay. We assumed that the delay grew linearly with the blocklength, so that the transmitter gets to act at time i based on x_1, x_2, \ldots, x_{i - \Delta n}, where n is the blocklength and \Delta is the delay parameter. Suppose everything is binary and the adversary gets to flip a total of pn bits of the codeword but has to see it causally with delay \Delta n. If \Delta = 1 the adversary sees nothing and the average-error capacity is 1 - h(p) bits. If \Delta = 0 the adversary can do a lot worse and the capacity is 1 - 4p bits. What we show is that for any \Delta > 0 we go back to 1 - h(p) bits (and there was much rejoicing). The key is allowing the encoder to randomize, which effectively prevents the adversary from learning anything. Luckily, the decoder can still figure out what is going on (as long as there is some additional slack in the rate), even though it does not share common randomness with the encoder.

ISIT 2010 : databases

I attended one talk on databases and unfortunately missed the second one, so the writeup I have is based on the paper…

IDENTIFICATION AND LOSSY RECONSTRUCTION IN NOISY DATABASES
(Ertem Tuncel; University of California, Riverside and Deniz Gunduz; Centre Tecnologic de Telecomunicacions de Catalunya)
This was the talk I did attend. The goal was to information-theoritize the notion of compressing databases. You get M i.i.d. individuals with characteristics distributed according to P_X and features Y conditioned on the individuals according to P_{Y | X}. The identification problem is this: given a noisy feature vector Z generated from a particular individual X(i) retrieve the index of the user corresponding to Z. This paper looked at the case where you want to recover a distorted version of Y(i), so they are trading off identification and distortion. The solution is to code Wyner-Zivly, and the result recover the existing results on the identification problem alone as well as the Wyner-Ziv problem.

A THEORY OF UTILITY AND PRIVACY OF DATA SOURCES
Lalitha Sankar; Princeton University, S. Raj Rajagopalan; HP Labs, H. Vincent Poor; Princeton University
This talk I missed, which is all the more sad because I work on this stuff. But I did get to see a version of it at ITA and also at the IPAM workshop on privacy (as a poster). This is again an attempt to information-theoretize the problem of data disclosure: given a database d of n individuals, release a noisy version of this database. In doing so, they assume that each individual has K features and the n rows are generated iid according to some distribution P. So, as they put in the paper, “the primary contribution is demonstrating the equivalence between the database privacy problem and a source coding problem with additional privacy constraints.” The notion of utility is captured by a distortion measure: the encoder maps each database to a string of bits W. A user with side information Z^n can attempt to reconstruct the database using the a lossy source code decoder. The notion of privacy by an equivocation constraint:
\frac{1}{n} H( X^n | W, Z^n) \ge E - \epsilon
They then set about finding the best utility-privacy (= distortion-equivocation) tradeoff. This is a different model than differential privacy, and rests on a number of information-theoretic assumptions that may be relaxed in the future. For example, assuming individuals are i.i.d., that distributions are known, that the side information sequence is known, and so on would give pause to the differential privacy crowd. Much like the epsilon in differential privacy, the equivocation has a meaning but its syntactic effect is hard to parse. I guess this is why syntactic definitions of privacy like k-anonymity are popular — it’s easy to see what they mean “on the ground.”

ISIT 2010 : error analysis and asynchrony

The remaining blogging from ISIT on my end will probably be delayed a bit longer since I have camera deadlines for EVT and ITW, a paper with Can Aysal for the special SP gossip issue that is within epsilon of getting done (and hence keeps getting put off) and a paper for Allerton with Michele Wigger and Young-Han Kim that needs some TLC. Phew! Maybe Alex will pick up the slack… (hint, hint)

Asynchronous Capacity per Unit Cost (Venkat Chandar, Aslan Tchamkerten, David Tse)
This paper was the lone non-adversarial coding paper in the session my paper was in. Aslan talked about a model for coding in which you have a large time block [0,A] in which B bits arrive at a random time at the encoder. The bits have to be transmitted to the receiver with delay less than D. The decoder hears noise when the encoder is silent, and furthermore doesn’t know when the encoder starts transmitting. They both know A, however. They do a capacity per-unit-cost analysis for this system and give capacity results in various cases, such as whether or not there is is a 0-cost symbol, whether or not the delay is subexponential in the number of bits to be sent, and so on.

Moderate Deviation Analysis of Channel Coding: Discrete Memoryless Case (Yucel Altug, Aaron Wagner)
This was a paper that looked at a kind of intermediate behavior in summing and normalizing random variables. If you sum up n iid variables and normalize by n you get a law of large numbers and large deviations analysis. If you normalized by \sqrt{n} you get a central limit theorem (CLT). What if you normalize by n^{2/3} for example? It turns out you get something like
\mathbb{P}( \frac{1}{n^{\beta}} \sum_{i=1}^{n} X_i \ge \epsilon ) \le \exp( -n^{2 \beta - 1} I_{\mathcal{N}}(\epsilon) )
where I_{\mathcal{N}}(\epsilon) is the rate function for the Gaussian you get out of the CLT. This is useful for error exponent analysis because it lets you get a handle on how fast you can get the error to decay if your sequence of codes has rate tending to capacity. They use these techniques to show that for a sequence of codes with gaps \epsilon_n to capacity the error decays like \exp( - n \epsilon_n^2/(2 \sigma), where \sigma is the dispersion of the channel.

Minimum energy to send k bits with and without feedback (Yury Polyanskiy, H. Vincent Poor, Sergio Verdu)
This was on the energy to send a bit but in the non-asymptotic energy regime (keeping philosophically with the earlier line of work by the authors) for the AWGN channel. Without feedback they show an achievability and converse for sending M codewords with error \epsilon and energy E. These give matching scalings up to the first three terms as E \to \infty. The first term is (E/N_0) \log e. Interestingly, with feedback, they show that the first term is 1/(1 - \epsilon) times the term without feedback, which makes sense since as the error goes to 0 the rates should coincide. They also have a surprising result showing that with energy E > k N_0 you can send 2^k messages with 0 error.

ISIT 2010 : some talks on statistics and probability

For many of the talks I attended I didn’t take notes — partly this is because I didn’t feel expert enough to note things down correctly, and partly because I am

RÉNYI DIVERGENCE AND MAJORIZATION (Tim van Erven; Centrum Wiskunde & Informatica, Peter Harremoës; Copenhagen Business College)
Peter gave a talk on some properties of the Rényi entropy

D_{\alpha}(P \| Q) = \frac{1}{\alpha - 1} \log \int p^{\alpha} q^{1 - \alpha} d \mu

where \mu is a dominating measure, and a related geometric quantity, the Lorenz diagram. This is a set of all points in \mathbb{R}^2 which are equal to (\int f dP, \int f dQ) for some unit measure function f on [0,1]. It turns out that subset relationships of Lorenz diagrams are the same as majorization relationships between the measures. This is related to something called Markov ordering from a recent paper by Gorban, Gorban, and Judge. This was one of those nice talks which pointed me towards some results and tools that are new to me but that I could not fully understand during the talk.

MINIMAX LOWER BOUNDS VIA F-DIVERGENCES (Adityanand Guntuboyina; Yale University)
This talk was on minimax lower bounds on parameter estimation which can be calculated in terms of generalized f divergences, which are of the form

D_f(P \| Q) = \int f \left( \frac{dP}{dQ} \right) dQ

Suppose T is an estimator for some parameter \theta. We want universal (over T) lower bounds on \mathbb{P}(T(X^n) \ne \theta). This gives a generalization of Fano’s inequality:

\mathbb{P}(T(X^n) \ne \theta) \ge \inf_Q \sum_{\theta} D_f(P_{\theta} \| Q)

This has a very geometric feel to it but I had a bit of a hard time following all of the material since I didn’t know the related literature very well (much of it was given in relation to Yang and Barron’s 1999 paper).

MUTUAL INFORMATION SADDLE POINTS IN CHANNELS OF EXPONENTIAL FAMILY TYPE (Todd Coleman; University of Illinois, Maxim Raginsky; Duke University)
In some cases where we have a class of channels and a class of channel inputs, there is a saddle point. The main example is that Gaussian noise is the worst for a given power constraint and Gaussian inputs are the best. Another example is the “bits through queues” This paper gave a more general class of channels of “exponential family type” and gave a more general condition under which you get a saddle point for the mutual information. The channel models are related to Gibbs variational principle, and the arguments had a kind of “free energy” interpretation. Ah, statistical physics rears its head again.

INFORMATION-THEORETIC BOUNDS ON MODEL SELECTION FOR GAUSSIAN MARKOV RANDOM FIELDS (Wei Wang, Martin J. Wainwright, Kannan Ramchandran; University of California, Berkeley)
This was on two problems: inferring edges in a graphical model from iid samples from that model, and inverse covariance estimation. They are similar, but not quite the same. The goal was to prove necessary conditions for doing this; these necessary conditions match the corresponding achievable rates from polytime algorithms. The main result was that you need n = \Omega( d^2 \log p) samples, where p is the number of variables, and d is the max degree in the graphical model. The proof approach was through modeling graph selection as channel coding in which the channel outputs samples from a graphical model chosen by the encoder. If the decoder can identify the model then the problem is solvable. This is analogous to the talk Sriram Vishwanath gave today on matrix completion.

ISIT 2010 : Anthony Ephremides

“The first phase [of life] you believe in Santa Claus, the second you don’t believe in Santa Clause, and in the third you become Santa Claus.” — Tony Ephremides

Prof. Anthony Ephremides gave the third plenary at ISIT, which included an interim report on the consummation between information theory and network coding and many connections between opera and research. He did a great job I think in explaining the difference between throughput region, stability region, and capacity region (under bursty vs. non-bursty use). These are in increasing order of inclusion. Some interesting tidbits (some new, some not):

  • He had a complaint about the way networking handles fading by simply saying the success probability for a packet being received is just \mathbb{P}( SNR > \gamma).
  • Under contention, the capacity region may not be convex, unlike in information theory where you can do time sharing.
  • For wireless network coding it is important to connect the MAC protocol and scheduling issues as well as change the notion of cut capacities. That is, you shouldn’t replace edges with hyper-edges, because that’s not a good model.
  • The information rate is the rate from the payload plus the information in the idleness and the information from the identity of the transmitter. That is, you get data from the actual packet, when the packet was sent, and who is sending the packet.
  • Extending many analyses to more than 2 users has not been done.
  • Can bursty traffic capacity be larger than non-bursty? The NNN (*) community says that would be a contradiction, because you can always then emulate bursty traffic. But this is unfair because idling the transmitter to build up packets and hence create artificial bursts is not allowed in the problem formulation, so there is no contradiction.
  • Relaying is good from a network perspective because it can partially enable a first-come first-serve (FCFS) discipline. So relays bridge a gap to the optimal scheduling policy. This is different than the information theory notion of cooperation yielding diversity gain.
  • Multicast throughput is an interesting thing to think about more rigorously for the future.
  • His prescription : the information theory has to apply its tools more broadly to networking problems and to unorthodox problems. The networking community should use more rigorous modeling and analysis methods.

(*) NNN = Nagging nabobs of negativism (really it should be “nattering nabobs of negativism,” Safire’s famous alliterative phrase wielded with such gusto by Spiro Agnew).

CTW 2010

Playa Norte, Isla Mujeres

Playa Norte on Isla Mujeres

I was lucky enough to get invited to present at the Communications Theory Workshop (CTW) in Cancun last month. Since the conference started on a Monday I took the weekend before to do a little sightseeing, hang out on the beach (see above), and visit Chichen Itza, which was pretty amazing. CTW is a small conference — single track, 2.5 days, and plenty of time for panels, discussions, etc. I had a great time there, and it was really nice to meet new people who were working on interesting problems that I hadn’t even heard about.

The first day was focused on wireless data services, the explosion thereof, and the lack of capacity to support it all. Add more relays, femtocells, interference management and alignment, putting more antennas for MIMO gains. Reinaldo Valenzuela was pretty pessimistic — according to him we are pretty much at the point where service providers are losing money by providing data over wireless. He seemed to think the only way out now is to change pricing models or really try to exploit network MIMO.

The second day was on cognitive radio, where the picture also seemed a bit pessimistic. Michael Marcus kind of laid into academics for not participating in the FCC comment process. The criticism is valid — if we care about cognitive radio and its potentials and so on, we should try to persuade the regulators to do smart things and not piecemeal things. As things stand now, it’s unclear how much additional capacity is available in the TV bands, given the current requirements for operating there. The sensing requirements nigh unreasonable, and if you do detect a primary you have to stay silent for 30 minutes. That makes it kind of hard to run a commercial service.

Another thing I learned during that day was that wireless microphones in the US operate in the TV bands, and are grandfathered in to the existing rules for cognitive radio. This means you have to detect if you will interfere with a wireless mic, which seems quite tough. Steve Shellhammer talked about a contest Qualcomm ran to build a detector for this problem. In addition, a large fraction of wireless microphone users are not licensed and hence their use is illegal, but nobody’s enforcing that.

The second half of the day was on emerging topics, followed by a panel called “Is Communication Theory Dead?” I think the pessimism and despair of the first two days needed to be let out a little, and there was a pretty spirited discussion of whether communication theorists were answering the right questions or not. It was like group therapy or something. Luckily there was the banquet after the panel so people could forget their woes…

The last day of the workshop was the session I spoke in, which were 5 talks on wireless networks, but from vastly different perspectives. Tara Javidi talked about routing, Daniela Tuninetti about information theory and cognitive interference channels, Elza Erkip about game theory models for the interference channel, and Anna Scaglione about cooperative information flooding. For my part, I talked about (very) recent work I have done with Tara on distributed consensus with quantized messages. We have some nice first-order results but I’m still hammering away at better bounds on the convergence rate.