# A teaser for ITAVision 2015

As part of ITAVision 2015 we are soliciting individuals and groups to submit videos documenting their love of information theory and/or its applications. During ISIT we put together a little example with our volunteers (it sounded better in rehearsal than at the banquet, alas). The song was Entropy is Awesome based on this, obviously. If you want to sing along, here is the Karaoke version:

The lyrics (so far) are:

Entropy is awesome!
Entropy is sum minus p log p
Entropy is awesome!
When you work on I.T.

Blockwise error vanishes as n gets bigger
Maximize I X Y
Polarize forever
Let’s party forever

I.I.D.
I get you, you get me
Communicating at capacity

Entropy is awesome…

This iteration of the lyrics is due to a number of contributors — truly a group effort. If you want to help flesh out the rest of the song, please feel free to email me and we’ll get a group effort going.

More details on the contest will be forthcoming!

# ISIT 2014: a few more talks

Annina Bracher (ETH Zurich, Switzerland); Amos Lapidoth (ETHZ, Switzerland)
The title pretty much describes it — there are two receivers which are both looking out for a particular message. This is the identification problem, in which the receiver only cares about a particular message (but we don’t know which one) and we have to design a code such that they can detect the message. The number of messages is $2^{2^{nC}}$ where $C$ is the Shannon capacity of the DMC. In the broadcast setting we run into the problem that the errors for the two receivers are entangled. However, their message sets are disjoint. The way out is to look at the average error for each (averaged over the other user’s message). The main result is that the rates only depend on the conditional marginals, and they have a strong converse.

Efficient compression of monotone and m-modal distributions
Jayadev Acharya (University of California, San Diego, USA); Ashkan Jafarpour (University of California, San Diego, USA); Alon Orlitsky (University of California, San Diego, USA); Ananda Theertha Suresh (University of California, San Diego, USA)
A monotone distribution is a distribution on $\mathbb{N}$ such that the probabilities are non-increasing. The redundancy for this class is infinite, alas, so they restrict the support to size $k$ (where $k$ can be large). They propose a two-step compression scheme in which the first step is to approximate the true distribution with a piecewise constant step distribution, and then use a compression scheme for step distributions.

Writing on a Dirty Paper in the Presence of Jamming
Amitalok J Budkuley (Indian Institute of Technology, Bombay, India); Bikash K Dey (Indian Institute of Technology Bombay, India); Vinod M Prabhakaran (Tata Institute of Fundamental Research, India)
Ahh, jamming. A topic near and dear to my heart. This paper takes a game-theoretic approach to jamming in a DPC setup: “the capacity of the channel in the presence of the jammer is the unique Nash equilibrium utility of the zero sum communication game between the user and the jammer.” This is a mutual information game, and they show that i.i.d. Gaussian jamming and dirty paper coding are a Nash equilibrium. I looked at an AVC version of this problem in my thesis, and the structure is quite a bit different, so this was an interesting different take on the same problem — how can we use the state information to render adversarial interference as harmless as noise?

Stable Grassmann Manifold Embedding via Gaussian Random Matrices
Hailong Shi (Tsinghua University & Department of Electronic Engineering, P.R. China); Hao Zhang (TsinghuaUniversity, P.R. China); Gang Li (Tsinghua University, P.R. China); Xiqin Wang (Tsinghua University, P.R. China)
This was in the session I was chairing. The idea is that you are given a subspace (e.g., a point on the Grassman manifold) and you want to see what happens when you randomly project this into a lower-dimensional subspace using an i.i.d. Gaussian matrix a la Johnson-Lindenstrauss. The JL Lemma says that projections are length-preserving. Are they also volume-preserving? It turns out that they are (no surprise). The main tools are measure concentration results together with a union bound over a covering set.

Is “Shannon capacity of noisy computing” zero?
Pulkit Grover (Carnegie Mellon University, USA)
Yes. I think. Maybe? Pulkit set up a physical model for computation and used a cut-set argument to show that the total energy expenditure is high. I started looking at the paper in the proceedings and realized that it’s significantly different than the talk though, so I’m not sure I really understood the argument. I should read the paper more carefully. You should too, probably.

# ISIT 2014: two more plenaries

As I wrote before, I took pretty woeful notes during ISIT this year, so I don’t have much to write about. Andrea Goldsmith’s plenary was about how we always say IT/Comm is dead, but she thinks we should be more sanguine about it. She presented a glimpse of some recent work with Stefano Rini on a unified approach for providing achievable results for single-hop networks using a graph to represent superposition coding and binning operations among the auxiliary variables. If it is actually as easy to use as advertised, it might save over the 23+ rate inequalities defining some achievable rate regions. The moral of the story is that it’s sometimes better to clean up our existing results a bit. I think the El-Gamal and Kim book did a great job of this for basic multiterminal IT, for example.

Vijay Kumar’s plenary was on codes for distributed storage and repair-bandwidth tradeoffs, focusing on extensions of the model. There was a lot of discussion of other code constructions, and how asking for certain properties (such as “locality”) can cost you something in the tradeoff. This is important when you can’t repair a code from arbitrary nodes in the network/data center — because there’s an underlying network which supplies the data for repair, codes should probably respect that network. At least that was the moral I took from this talk. Since I don’t work on coding, some things were a little over my head, but I thought he did an excellent job of keeping it accessible with nice concrete examples.

# ISIT 2014: Janos Körner’s Shannon Lecture

Janos Körner’s Shannon lecture was “On the Mathematics of Distinguishable Difference.” He began by remarking how information theory in Hungary was really a branch of mathematics — at least that’s how Rényi viewed it — and how collaborations made him appreciate some of the operational significance of information problems. On the other hand, he also made a strong case for understanding the combinatorial structure of many problems abstractly. A simple motivating example was something he called “trifference.” Basically he is asking for

$T(n) = \max \{ |C| : C \subseteq \{ 0, 1, 2 \}^n,$
$\forall \{x,y,z\}\ \mathrm{distinct}$
$\hspace{1in} \exists i\ \mathrm{s.t.} \{x_i, y_i, z_i\} = \{0, 1, 2\} \}$

That’s a mouthful! We want a set of trinary codewords $C$ such that any triple of codewords differs in at a least one position. What’s the maximum size of such a set? That’s $T(n)$. More specifically, we want the rate of growth of this thing. That we have are upper and lower bounds:

$\frac{1}{4} \log \frac{9}{5} \le \lim \sup \frac{1}{n} \log T(n) \le \log \frac{3}{2}$.

It turns out that simple i.i.d. random coding is not going to give you a good set — the lower bound comes from a non-uniform random codebook. The upper bound is actually a capacity of a hypergraph. This led him to his second topic, which was on hypergraph entropy, a generalization of graph entropy. This is connected to Sperner families of subsets: collections such that for any pair of subsets, neither contains the other. The rate of growth of Sperner families is also related to the hypergraph entropy.

I didn’t really manage to take as good notes as I might have wanted, but I really enjoyed the lecture, and you can too, now that the video has been posted on the IT Society website. I heard some people complaining that the talk was a little too technical while walking out of the plenary hall, but for me, it was quite clear and quite interesting, even at 8:30 in the morning. You can’t please everyone I guess!

# ISIT 2014: how many samples do we need?

Due to jetlag, my CAREER proposal deadline, and perhaps a bit of general laziness, I didn’t take as many notes at ISIT as I would have, so my posting will be somewhat light (in addition to being almost a month delayed). If someone else took notes on some talks and wants to guest-post on it, let me know!

Strong Large Deviations for Composite Hypothesis Testing
Yen-Wei Huang (Microsoft Corporation, USA); Pierre Moulin (University of Illinois at Urbana-Champaign, USA)
This talk was actually given by Vincent Tan since neither of the authors could make it (this seems to be a theme of talks I’ve attended this summer. The paper was about testing a simple hypothesis $H_1$ versus a composite hypothesis $H_0$ where under $H_0$ the observations are i.i.d. with respect to one of possibly $k$ different distributions. There are therefore $k$ different errors and the goal is to characterize these errors when we ask for the probability of true detection to be greater than $1 - \epsilon$. This is a sort of generalized Neyman-Pearson setup. They look at the vector of log-likelihood ratios and show that a threshold test is nearly optimal. At the time, I understood the idea of the proof, but I think it’s one of things where you need to really read the paper.

Randomized Sketches of Convex Programs with Sharp Guarantees
Mert Pilanci (University of California, Berkeley, USA); Martin J. Wainwright (University of California, Berkeley, USA)
This talk was about using random projections to lower the complexity of solving a convex program. Suppose we want to minimize $\| Ax - y \|^2$ over $x$ given $y$. A sketch would be to solve $\| SAx - Sy \|^2$ where $S$ is a random projection. One question is how to choose $A$. They show that choosing $S$ to be a randomized Hadamard matrix (the paper studies Gaussian matrices), then the objective value of the sketched program is at most $(1 + \epsilon)^2$ times the value of the original program as long as the the number of rows of $S$ is larger than $O( \epsilon^{-2} \mathbb{W}^2(A \mathcal{K}))$, where $\mathbb{W}(A \mathcal{K})$ is the Gaussian width of the tangent cone of the contraint set at the optimum value. For more details look at their preprint on ArXiV.

On Efficiency and Low Sample Complexity in Phase Retrieval
Youssef Mroueh (MIT-IIT, USA); Lorenzo Rosasco (DIBRIS, Unige and LCSL – MIT, IIT, USA)
This was another talk not given by the authors. The problem is recovery of a complex vector $x_0 \in \mathbb{C}^n$ from phaseless measurements of the form $b_i = |\langle a_i, x_0 \rangle|^2$ where $a_i$ are complex spherically symmetric Gaussian vectors. Recovery from such measurements is nonconvex and tricky, but an alternating minimizing algorithm can reach a local optimum, and if you start it in a “good” initial position, it will find a global optimum. The contribution of this paper is provide such a smart initialization. The idea is to “pair” the measurements to create new measurements $y_i = \mathrm{sign}( b_i^{(1)} - b_i^{(2)} )$. This leads to a new problem (with half as many measurements) which is still hard, so they find a convex relaxation of that. I had thought briefly about such sensing setups a long time ago (and by thought, I mean puzzled over it at a coffeshop once), so it was interesting to see what was known about the problem.

Sorting with adversarial comparators and application to density estimation
Jayadev Acharya (University of California, San Diego, USA); Ashkan Jafarpour (University of California, San Diego, USA); Alon Orlitsky (University of California, San Diego, USA); Ananda Theertha Suresh (University of California, San Diego, USA)
Ashkan gave this talk on a problem where you have $m$ samples from an unknown distribution $p$ and a set of distributions $\{q_1, q_2, \ldots, q_n\}$ to compare against. You want to find the distribution that is closest in $\ell_1$. One way to do this is via Scheffe tournament tht compares all pairs of distributions — this runs in time $n^2$ time. They show a method that runs in $O(n)$ time by studying the structure of the comparators used in the sorting method. The motivation is that running comparisons can be expensive (especially if they involve human decisions) so we want to minimize the number of comparisons. The paper is significantly different than the talk, but I think it would definitely be interesting to those interested in discrete algorithms. The density estimation problem is really just a motivator — the sorting problem is far more general.

# Aloha from ISIT 2014!

I still owe a post from ICML and I am supposedly writing a proposal now but some blogging will happen soon (probably as a procrastination technique).