SPCOM 2014: tutorials

I just attended SPCOM 2014 at the Indian Institute of Science in Bangalore — many thanks to the organizers for the invitation! SPCOM 2014 happens every two years and is a mix of invited and submitted papers (much like Allerton). This year they mixed the invited talks with the regular talks which I thought was a great idea — since invited papers were not for specific sessions, it makes a lot more sense to do it that way, plus it avoids a sort of “two-tier” system.

I arrived early enough to catch the tutorials on the first day. There was a 3 hour session in the morning and another on the in afternoon. For the morning I decided to expand my horizons by attending Manoj Gopalkrishnan‘s tutorial on the physics of computation. Manoj focused on the question of how much energy it takes to erase or copy a bit of information. He started with some historical context via von Neumann, Szilard, and Landauer to build a correspondence between familiar information theoretic concepts and their physical counterparts. So in this correspondence, relative entropy is the same as free energy. He then turned to look at what one might call “finite time” thermodynamics. Suppose that you have to apply a control that operates in finite time in order to change a bit. One way to look at this is through controlling the transition probabilities in a two-state Markov chain representing the value of the bit you want to fix. You want to drive the resting state (with stationary distribution $(1/2,1/2)$ to something like $(\epsilon, 1 - \epsilon)$ within time $T$. At this level I more or less understood what was going on, but since my physics background is pretty poor, I think I missed out on how the physical intuition/constraints impact what control strategies you can choose.

Prasad Santhanam gave the other tutorial, which was a bit more solid ground for me. This was not quite a tutorial on large-alphabet probability estimation, but more directly on universal compression and redundancy calculations. The basic setup is that you have a family of distributions $\mathcal{P}$ and you don’t know which distribution $p \in \mathcal{P}$ will generate your data. Based on the data sample you want to do something: estimate some property of the distribution, compress the sample to a size close to its entropy, etc. A class can be weakly or strongly compressible, or insurable (which means being able to estimate quantiles), and so on. These problems turn out to be a bit different from each other depending on some topological features of the class. One interesting thing to consider for the machine learners out there this stopping time that you need in some analyses. As you are going along, observing the data and doing your task (estimation, compression, etc) can you tell from the data that you are doing well? This has major implications for whether or not an online algorithm can even work the way we want it to, and is something Prasad calls “data-driven compressible.”

I’ll try to write another post or two about the talks I saw as well!

ISIT 2014: a few more talks

Annina Bracher (ETH Zurich, Switzerland); Amos Lapidoth (ETHZ, Switzerland)
The title pretty much describes it — there are two receivers which are both looking out for a particular message. This is the identification problem, in which the receiver only cares about a particular message (but we don’t know which one) and we have to design a code such that they can detect the message. The number of messages is $2^{2^{nC}}$ where $C$ is the Shannon capacity of the DMC. In the broadcast setting we run into the problem that the errors for the two receivers are entangled. However, their message sets are disjoint. The way out is to look at the average error for each (averaged over the other user’s message). The main result is that the rates only depend on the conditional marginals, and they have a strong converse.

Efficient compression of monotone and m-modal distributions
Jayadev Acharya (University of California, San Diego, USA); Ashkan Jafarpour (University of California, San Diego, USA); Alon Orlitsky (University of California, San Diego, USA); Ananda Theertha Suresh (University of California, San Diego, USA)
A monotone distribution is a distribution on $\mathbb{N}$ such that the probabilities are non-increasing. The redundancy for this class is infinite, alas, so they restrict the support to size $k$ (where $k$ can be large). They propose a two-step compression scheme in which the first step is to approximate the true distribution with a piecewise constant step distribution, and then use a compression scheme for step distributions.

Writing on a Dirty Paper in the Presence of Jamming
Amitalok J Budkuley (Indian Institute of Technology, Bombay, India); Bikash K Dey (Indian Institute of Technology Bombay, India); Vinod M Prabhakaran (Tata Institute of Fundamental Research, India)
Ahh, jamming. A topic near and dear to my heart. This paper takes a game-theoretic approach to jamming in a DPC setup: “the capacity of the channel in the presence of the jammer is the unique Nash equilibrium utility of the zero sum communication game between the user and the jammer.” This is a mutual information game, and they show that i.i.d. Gaussian jamming and dirty paper coding are a Nash equilibrium. I looked at an AVC version of this problem in my thesis, and the structure is quite a bit different, so this was an interesting different take on the same problem — how can we use the state information to render adversarial interference as harmless as noise?

Stable Grassmann Manifold Embedding via Gaussian Random Matrices
Hailong Shi (Tsinghua University & Department of Electronic Engineering, P.R. China); Hao Zhang (TsinghuaUniversity, P.R. China); Gang Li (Tsinghua University, P.R. China); Xiqin Wang (Tsinghua University, P.R. China)
This was in the session I was chairing. The idea is that you are given a subspace (e.g., a point on the Grassman manifold) and you want to see what happens when you randomly project this into a lower-dimensional subspace using an i.i.d. Gaussian matrix a la Johnson-Lindenstrauss. The JL Lemma says that projections are length-preserving. Are they also volume-preserving? It turns out that they are (no surprise). The main tools are measure concentration results together with a union bound over a covering set.

Is “Shannon capacity of noisy computing” zero?
Pulkit Grover (Carnegie Mellon University, USA)
Yes. I think. Maybe? Pulkit set up a physical model for computation and used a cut-set argument to show that the total energy expenditure is high. I started looking at the paper in the proceedings and realized that it’s significantly different than the talk though, so I’m not sure I really understood the argument. I should read the paper more carefully. You should too, probably.

ISIT 2014: two more plenaries

As I wrote before, I took pretty woeful notes during ISIT this year, so I don’t have much to write about. Andrea Goldsmith’s plenary was about how we always say IT/Comm is dead, but she thinks we should be more sanguine about it. She presented a glimpse of some recent work with Stefano Rini on a unified approach for providing achievable results for single-hop networks using a graph to represent superposition coding and binning operations among the auxiliary variables. If it is actually as easy to use as advertised, it might save over the 23+ rate inequalities defining some achievable rate regions. The moral of the story is that it’s sometimes better to clean up our existing results a bit. I think the El-Gamal and Kim book did a great job of this for basic multiterminal IT, for example.

Vijay Kumar’s plenary was on codes for distributed storage and repair-bandwidth tradeoffs, focusing on extensions of the model. There was a lot of discussion of other code constructions, and how asking for certain properties (such as “locality”) can cost you something in the tradeoff. This is important when you can’t repair a code from arbitrary nodes in the network/data center — because there’s an underlying network which supplies the data for repair, codes should probably respect that network. At least that was the moral I took from this talk. Since I don’t work on coding, some things were a little over my head, but I thought he did an excellent job of keeping it accessible with nice concrete examples.

ISIT 2014: Janos Körner’s Shannon Lecture

Janos Körner’s Shannon lecture was “On the Mathematics of Distinguishable Difference.” He began by remarking how information theory in Hungary was really a branch of mathematics — at least that’s how Rényi viewed it — and how collaborations made him appreciate some of the operational significance of information problems. On the other hand, he also made a strong case for understanding the combinatorial structure of many problems abstractly. A simple motivating example was something he called “trifference.” Basically he is asking for

$T(n) = \max \{ |C| : C \subseteq \{ 0, 1, 2 \}^n,$
$\forall \{x,y,z\}\ \mathrm{distinct}$
$\hspace{1in} \exists i\ \mathrm{s.t.} \{x_i, y_i, z_i\} = \{0, 1, 2\} \}$

That’s a mouthful! We want a set of trinary codewords $C$ such that any triple of codewords differs in at a least one position. What’s the maximum size of such a set? That’s $T(n)$. More specifically, we want the rate of growth of this thing. That we have are upper and lower bounds:

$\frac{1}{4} \log \frac{9}{5} \le \lim \sup \frac{1}{n} \log T(n) \le \log \frac{3}{2}$.

It turns out that simple i.i.d. random coding is not going to give you a good set — the lower bound comes from a non-uniform random codebook. The upper bound is actually a capacity of a hypergraph. This led him to his second topic, which was on hypergraph entropy, a generalization of graph entropy. This is connected to Sperner families of subsets: collections such that for any pair of subsets, neither contains the other. The rate of growth of Sperner families is also related to the hypergraph entropy.

I didn’t really manage to take as good notes as I might have wanted, but I really enjoyed the lecture, and you can too, now that the video has been posted on the IT Society website. I heard some people complaining that the talk was a little too technical while walking out of the plenary hall, but for me, it was quite clear and quite interesting, even at 8:30 in the morning. You can’t please everyone I guess!

Some LaTeX hacks for writing proposals

I just submitted my CAREER proposal on Monday, and now that the endless revision process is over, I wanted to write a post about what I learned about proposal writing. But that seems like hubris, since I have no idea if the thing will get funded, so what do I know? Besides, everything I learned was from the valuable feedback I got from those who very kindly read my previous drafts. So instead of writing about How To Sell Your Idea, I figured I would write about some cute LaTeX hacks that I came across, most of them via the very useful TeX-LaTeX Stack Exchange.

If anyone else has some useful hacks, feel free to leave them in the comments!

Saving Space

One of the big problems in NSF proposal writing is that there’s a hard limit on the number of pages (not the number of words), so if you’re at the edge, there’s a lot of “oops, two lines over” hacking to be done towards the end.

• \usepackage{times} \usepackage{mathptmx}: The typeface for your proposal makes a big difference in space. Computer Modern is a bit easier to read since there’s more whitespace, but Times shaved a whole page off of my proposal. The NSF Grant Proposal Guidelines has the list of approved formatting. It seems standard for NIH proposals to use 11pt Arial but that makes me want to gouge my eyes out. Know thy reviewers, is what I would say: keep in mind what’s standard for the solicitation and don’t make the proposal so dense as to be unreadable. NB: Apparently the times package is deprecated (see comments).
• \usepackage{titlesec}. This package lets you control the spacing around your titles and subtitles like this:

 \titlespacing\section{0pt}{10pt plus 2pt minus 2pt}{2pt plus 2pt minus 2pt} \titlespacing\subsection{0pt}{8pt plus 2pt minus 2pt}{2pt plus 2pt minus 2pt} 

See this post for more details, but basically it’s \titlespacing{command}{left spacing}{before spacing}{after spacing}. This is handy because there’s a lot of empty space around titles/subtitles and it’s an easy way to trim a few lines while making sure things don’t get too cramped/ugly.

• \usepackage{enumitem}: This package lets you control the spacing around your enumerate lists. The package has a lot of options but one that may be handy is \setlist{nosep} which removes the space around the list items. This actually makes things a little ugly, I think, but bulleted lists are helpful to the reviewer and they also take a little more space, so this lets you control the tradeoff. Another thing that is handy to control is the left margin: \setlist[itemize,1]{leftmargin=20pt}.
• \usepackage{savetrees}: Prasad says it’s great, but I didn’t really use it. YMMV.

Customizations

• Sometimes it’s handy to have a new theorem environment for Specific Aims or Open Problems or what-have-you. The problem is (as usual) that the theorem environment by itself puts in extra space and isn’t particularly customizable. So one option is to define a new theorem style:

 \newtheoremstyle{mystyle}% name {5pt}%Space above {5pt}%Space below {\itshape}% Body font {5pt}%Indent amount {\bfseries}% Theorem head font {:}%Punctuation after theorem head {4pt}%Space after theorem head 2 {}%Theorem head spec (can be left empty, meaning ‘normal’)

 

\theoremstyle{mystyle} \newtheorem{specaim}{Specific Aim} 

• Another handy hack is to make a different citation command to use for your own work that will then appear in a different color than normal citations if you use \usepackage[colorlinks]{hyperref}. I learned how to do this by asking a question on the stack exchange.


• The hyperref package also creates internal links to equations and Figures (if you label them) and so on, but the link is usually just the number of the label, so you have to click on “1” instead of “Figure 1” being the link. One way to improve this is to make a custom reference command:

 \newcommand{\fref}[2]{\hyperref[#2]{#1 \ref*{#2}}} 

So now you can write \fref{Figure}{fig:myfig} to get “Figure 1” to be clickable.

• You can also customize the colors for hyperlinks:

 \hypersetup{ colorlinks, citecolor=blue, linkcolor=magenta, urlcolor=MidnightBlue} 

• Depending on your SRO, they may ask you to deactivate URLs in the references section. I had to ask to figure this out, but basically putting \let\url\nolinkurl before the bibliography seemed to work…

ISIT 2014: how many samples do we need?

Due to jetlag, my CAREER proposal deadline, and perhaps a bit of general laziness, I didn’t take as many notes at ISIT as I would have, so my posting will be somewhat light (in addition to being almost a month delayed). If someone else took notes on some talks and wants to guest-post on it, let me know!

Strong Large Deviations for Composite Hypothesis Testing
Yen-Wei Huang (Microsoft Corporation, USA); Pierre Moulin (University of Illinois at Urbana-Champaign, USA)
This talk was actually given by Vincent Tan since neither of the authors could make it (this seems to be a theme of talks I’ve attended this summer. The paper was about testing a simple hypothesis $H_1$ versus a composite hypothesis $H_0$ where under $H_0$ the observations are i.i.d. with respect to one of possibly $k$ different distributions. There are therefore $k$ different errors and the goal is to characterize these errors when we ask for the probability of true detection to be greater than $1 - \epsilon$. This is a sort of generalized Neyman-Pearson setup. They look at the vector of log-likelihood ratios and show that a threshold test is nearly optimal. At the time, I understood the idea of the proof, but I think it’s one of things where you need to really read the paper.

Randomized Sketches of Convex Programs with Sharp Guarantees
Mert Pilanci (University of California, Berkeley, USA); Martin J. Wainwright (University of California, Berkeley, USA)
This talk was about using random projections to lower the complexity of solving a convex program. Suppose we want to minimize $\| Ax - y \|^2$ over $x$ given $y$. A sketch would be to solve $\| SAx - Sy \|^2$ where $S$ is a random projection. One question is how to choose $A$. They show that choosing $S$ to be a randomized Hadamard matrix (the paper studies Gaussian matrices), then the objective value of the sketched program is at most $(1 + \epsilon)^2$ times the value of the original program as long as the the number of rows of $S$ is larger than $O( \epsilon^{-2} \mathbb{W}^2(A \mathcal{K}))$, where $\mathbb{W}(A \mathcal{K})$ is the Gaussian width of the tangent cone of the contraint set at the optimum value. For more details look at their preprint on ArXiV.

On Efficiency and Low Sample Complexity in Phase Retrieval
Youssef Mroueh (MIT-IIT, USA); Lorenzo Rosasco (DIBRIS, Unige and LCSL – MIT, IIT, USA)
This was another talk not given by the authors. The problem is recovery of a complex vector $x_0 \in \mathbb{C}^n$ from phaseless measurements of the form $b_i = |\langle a_i, x_0 \rangle|^2$ where $a_i$ are complex spherically symmetric Gaussian vectors. Recovery from such measurements is nonconvex and tricky, but an alternating minimizing algorithm can reach a local optimum, and if you start it in a “good” initial position, it will find a global optimum. The contribution of this paper is provide such a smart initialization. The idea is to “pair” the measurements to create new measurements $y_i = \mathrm{sign}( b_i^{(1)} - b_i^{(2)} )$. This leads to a new problem (with half as many measurements) which is still hard, so they find a convex relaxation of that. I had thought briefly about such sensing setups a long time ago (and by thought, I mean puzzled over it at a coffeshop once), so it was interesting to see what was known about the problem.

Sorting with adversarial comparators and application to density estimation
Jayadev Acharya (University of California, San Diego, USA); Ashkan Jafarpour (University of California, San Diego, USA); Alon Orlitsky (University of California, San Diego, USA); Ananda Theertha Suresh (University of California, San Diego, USA)
Ashkan gave this talk on a problem where you have $m$ samples from an unknown distribution $p$ and a set of distributions $\{q_1, q_2, \ldots, q_n\}$ to compare against. You want to find the distribution that is closest in $\ell_1$. One way to do this is via Scheffe tournament tht compares all pairs of distributions — this runs in time $n^2$ time. They show a method that runs in $O(n)$ time by studying the structure of the comparators used in the sorting method. The motivation is that running comparisons can be expensive (especially if they involve human decisions) so we want to minimize the number of comparisons. The paper is significantly different than the talk, but I think it would definitely be interesting to those interested in discrete algorithms. The density estimation problem is really just a motivator — the sorting problem is far more general.