### July 2012

I usually tag things from ArXiV that I want to skim later, so I figured I would post the ones I found interesting in the last month or so. Some of them may deserve their own post later. Jet lag is not letting me take the nap I need right now so I figured I’d post it before conking out.

Convergence Rates for Differentially Private Statistical Estimation
K. Chaudhuri and D. Hsu
ArXiV 1206.6395
This paper is about estimation under privacy constraints. In differential privacy there are a fair number of constraints with regards to the data being well behaved — typical assumptions include boundedness, for example. One way around this is to move to a relaxed notion of privacy known as $(\alpha,\delta)$-differential privacy, but what this paper shows is that for the stricter $\alpha$-differential privacy definition, statistical estimation is governed by the gross error sensitivity (GES) of the estimator.

The Greedy Miser: Learning under Test-time Budgets
Z. Xu, K. Weinberger, and O. Chapelle
ArXiV 1206.6451
This paper is about incorporating the cost of extracting features in classification or regression. For a data point $\mathbf{x}_i \in \mathbb{R}^d$, the feature $\alpha$ costs $c_{\alpha}$ to compute and there is a total budget for the computation. The main approach is to minimize a linear combination of the loss and the cost iteratively, where each iteration consists of building a regression tree.

Sequential detection of multiple change points in networks: a graphical model approach
A.A. Amini and X. Nguyen
ArXiV 1207.1687
This looks at the change point detection problem in the network setting, where each of $d$ nodes in a network has a change point $\lambda_j$. Data is observed on vertices and edges and nodes have to pass messages on the graph. There’s a prior distribution on the change point times and after observing the data the nodes can use message passing to do some belief propagation to estimate the posterior distribution and define a stopping time. This turns out to be asymptotically optimal, in that the expected delay in detecting the change point is achieved.

The Price of Privacy in Untrusted Recommendation Engines
S. Bannerjee, N. Hegde, and L. Massoulié
ArXiV 1207.3269
This studies the effect on recommendations engines of individuals masking their data due to privacy concerns. The technical issues are the the dimension is high and each user only has a small amount of data to control (e.g. the movies that they rented or have seen). The question is one of sample complexity — how many users do we need to cluster successfully. If users are not so sparse, they can achieve the Fano lower bound (order-wise) using a sketch and spectral techniques, but the dependence is like $N \log N$, where $N$ is the number of items (e.g. movies). It’s worse if the users are sparse — the scaling is with $N^2$, which is also achievable in some settings. The analysis approach using Fano is analogous to the “information leakage” connections with differential privacy.

Surrogate Losses in Passive and Active Learning
S. Hanneke and L. Yang
ArXiV 1207.3772
This paper looks at how to use surrogate losses (which are computationally tractable) in active learning problems. “We take as our starting point that we have already committed to use a given surrogate loss, and we restrict our attention to just those scenarios in which this heuristic actually does work. We are then interested in how best to make use of the surrogate loss toward the goal of producing a classifier with relatively small error rate.” The central insight in their algorithm is that by taking into account the fact that the “real loss” is (say) the 0-1 loss, then for the purposes of active learning, one only really needs to get the sign right and not the magnitude. So the process will minimize the real loss but not necessarily the surrogate.

Well, SPCOM 2012 is over now — it was a lot of fun and a really nice-sized conference. I missed the first day of tutorials, which I heard were fantastic. Qing Zhao couldn’t make it due to visa issues but gave her tutorial over Skype. Hooray for technology!

After a fun trip to Mysore this weekend, I have landed at the Indian Institute of Science in Bangalore (Bengaluru for those more PC), for SPCOM 2012. This is a 4 day conference (yesterday was tutorials) that has been held 8 times before (sometimes skipping years) here at IISc. I’ve never visited here before, but my father, grandfather, and great grandfather are all graduates, so it feels pretty special to finally see the place. My talk is dead last, so we’ll see how many people are still here by the time it rolls around.

Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent
William H. Press and Freeman J. Dyson
PNAS 109(26), June 26, 2012

This was an interesting little paper that I learned about that I am still digesting; as I’ve said before, I’m not a game theory guy. They start out with a 2×2 game, the classical Prisoner’s Dilemma with players X and Y — if both cooperate (C), they payoffs are $(R,R)$, if the first (resp. second) defects (D) they are $(T,S)$ (resp. $(S,T)$) and if they both defect then $(P,P)$. The usual conditions are $T > R > P > S$ to keep it interesting.

In an iterated game the players get to play over and over again and try to develop strategies to win more than their opponent. The first interesting (to me) observation in this paper is that strategies which maintain a long memory are dominated by those which maintain a short memory in the sense that if for any long-memory strategy of Y, the payoff for a short-memory strategy of X is the same as a certain short-memory strategy of Y. So if your opponent is playing a short-memory strategy you should also play a similar short-memory strategy.

So with that idea in mind they look at first-order Markov strategies — given the past state (CC, CD, DC, DD) player X randomly chooses whether to cooperate or not in the next round. Thus the problem is specified by 4 conditional probabilities of cooperating conditioned on the previous state, and you can compute a 4×4 Markov matrix of transition probabilities out of the randomized strategies of each player.

Now the game becomes how they payoffs behave under different choices of strategy, which boil down to spectral / structural properties of this Markov matrix. To be honest, I found the discussion in the rest of the paper more confusing than enlightening because the mathematics was a little obscured by the discussion of implications. I realize this is a PNAS paper but it felt a little discursive at times and the mathematical intuition was not entirely clear to me.

Still, it’s a short read and probably worth a peruse.

In my previous post I made some vague comments about maximal and average error which I think are not complicated but don’t often arise when we think about plain vanilla information theory problems. These come up in arbitrarily varying channels, but also help motivate some interesting alternative models.

We most often think of error criteria as being about the desiderata for communication — asking for the maximal error over messages to be small is more stringent than asking for the average error. However, if we have a code which has small average error we can expurgate bad codewords to get a code which has small maximal error. With that meta-argument in hand, at a first pass we’re done : average error and maximal error are pretty much the same.

However, choosing an error criterion can also be a choice about the communication model. If the noise can somehow depend on what is being transmitted, then this distinction can come up. If the error criterion is maximal error and the interference is arbitrary, then for each message and each interference signal, I need to guarantee that probability of error is small. Implicitly, this allows the interference to be chosen as a function of the message. Under average error, the interference cannot be a function of the message.

If we allow for common randomness, the transmitter can mask the dependence of the codeword on the message. Suppose we have a deterministic code with $N$ codewords which has small average error. Then we can create a randomized code by choosing a random permutation $\pi$ of $N$ and encoding message $m$ with the $\pi(m)$-th codeword of the original code. In this way we get the maximal error to be small, since the maximal error is averaged over the common randomness.

From this it’s clear that the choice of error criterion is also a modeling choice, especially when there is no common randomness allowed. However, this is all looking at the case where the interference can depend on the message. The trickier situation is when the interference can depend on the transmitted codeword, which is what I discussed in the previous post. In that setting using common randomness to mask the message-codeword dependence doesn’t buy that much. What I was working on recently was how much it buys you when the dependence is weakened.

As a final note, this model of interference depending on the transmitted codeword appears in a stochastic form under the name “correlated jamming” and has been worked on by several researchers, including Muriel Médard, Shabnam Shafiee, and Sennur Ulukus. My goal was to get an AVC and geometric perspective on these kinds of communication models.

I have an iPad now (hooray research funds) and I’ve been trying to figure out how to make it more useful for me research-wise. Here are some apps I’ve found, but I would love recommendations on others that are useful for academics who do math-y stuff.

OmniFocus is a task management app — it’s a little heavy-featured and expensive, but I’ve been using it to help myself reorganize (and sync) when on planes and the like.

iAnnotate PDF is a PDF annotation tool that syncs to Dropbox (among other things). I now drop the papers I have to review in to a folder there and then read and mark them up. Given that my review load has doubled recently, being able to make notes without printing is nice. I think it will get even better once I get a stylus.

In terms of media, IEEE Spectrum is available, as is Communications magazine (but not Signal Processing).

I use Prompt to SSH into machines on the few times that I need to do that from the iPad. It has saved my butt once or twice.

PDF Presenter is supposed to be nice for doing slides, but I haven’t used it yet.

There’s an old post on LaTeX for the iPad, but nothing really looked that appealing.

Anyone else have some recommendations?

I am pleased as punch that Katalin Marton won the 2013 Shannon Award. She made seminal contributions in information theory in the earlier years but has been active in combinatorics and stochastic processes since. Many people start doing information theory because of an interest in digital communications — my own academic background reflects this — we get some blinders on regarding the role information theory plays in other fields, such as statistics (c.f. Rissanen) and combinatorics (c.f. Ahlswede). It’s great to see Marton’s work recognized, and I hope she talks about some of her more recent interests. Finally, I find it inspirational that a woman has finally won the Shannon Award — it’s a sign of the change that is happening in engineering more broadly, I hope.

I’m writing this (but not posting it) from somewhere above Belarus, on my way to Delhi and then on to Bangalore for SPCOM 2012. I was extended a very kind invitation to give a talk there, and I decided to present some work related to my thesis research on AVCs. I’m still working out my one-sentence summaries, but I figured I could use the old blog to work out some thoughts. Plus I don’t think I am going to be able to sleep on this flight any more.

The motivation for this work is the question of how to model uncertain interference. The Shannon-theoretic approach to this is to say that a communication channel is subject to stochastic noise and then look at how properties of the noise (memory, burstiness) affect the capacity. The coding-theoretic approach is to treat (usually discrete) noise as adversarial — I need to design a code that corrects all error patterns of $pn$ bits. This is a stronger requirement, and in particular, I can even allow the noise to depend on the transmitted codeword. Of course, this leads to a lower capacity in general.

Can we close the gap between these two models? One idea is to allow the encoder and decoder to jointly randomize (common randomness). This doesn’t buy anything in the Shannon-theoretic case, but for the case of codeword-dependent worst case noise, the capacity turns out to be $1 - h( p )$ for the binary channel (see Langberg 2004). Unfortunately, this is does not hold for more general channels, which is part of what my thesis is about.

What happens in the when the channel has continuous inputs and output with input power-limited to $P$ and interference limited to $N$? Shannon says the capacity is $\frac{1}{2} \log(1 + P/N)$. The corresponding worst-case channel is the Gaussian arbitrarily varying channel (GAVC). But there’s a new twist here — can the noise depend on the transmitted codeword?

Suppose that it can’t and we care about the average error. Without common randomness, Csiszár and Narayan showed (under technical conditions) that the capacity is $\frac{1}{2} \log(1 + P/N)$ only if $P > N$, and is 0 otherwise. This is because if the interference has more power than the transmitter it could spoof the transmitter.

However, if there is common randomness and the interference can depend on the transmitted codeword, Agarwal, Sahai, and Mitter showed that the capacity is actually $\frac{1}{2} \log ( P/N )$, which is like a rate distortion function. This makes sense — if $P < N$, the interference can simply cancel the transmitted codeword.

The model I am going to talk about is one in which the interference cannot depend on the transmitted codeword, but on a noisy version of the transmitted codeword. I call this coding against myopic adversaries (it sounds nicer than “four-eyed”). In a later post I’ll talk about some of the geometric intuition of this case.

This is an awesome approach to getting consensus on neighborhood boundaries in Boston. They should do that for Chicago!

A history of currywurst.

Classical Movies in Miniature Style. I like the horses in the Terminator II picture.

I have a rather long-ish commute on public transit, and sometimes it’s hard to get a seat on the train/bus, so I’ve been listening to a lot more podcasts. Here are a few which I’ve been enjoying recently:

• 99% Invisible, which is a design podcast. I’ve been catching up from the beginning, but this little bit on flags may appeal to Chicagoans and San Franciscans.
• Backstory is a podcast about American History. They usually take a theme (e.g. “national monuments,” “birth,”, “booze”) and do a number of segments running through different centuries.
• Story Collider : story telling about science(-ish).

Music with giant Tesla coils.

Dogs and cats and babies can get along.

The Life of Olaudah Equiano, or Gustavus Vassa the African (Olaudah Equiano) — a classic autobiography of a 18th century African who was captured and sold into slavery, managed to save enough to buy his freedom, and then had a number of adventures in the Caribbean, Mediterranean, and Arctic. It’s a real window into the time period as well as one of the oldest textual accounts of Europeans through African eyes during that period.

That Awful Mess on Via Merulana (Carlo Emilio Gadda) — a allegorical murder mystery / police investigation. The use of language is dazzling, but the plot, as such, is a bit flat. Recommended for those who like florid prose, extreme vocabulary, and political satire. In this case Gadda is mocking fascism. It may be appealing to those who liked The Master and Margarita.

Red Plenty (Francis Spufford) — a lightly fictionalized history of the moment when it looked like central planning was going to work in the Soviet Union, and how it all then fell apart (later to be buoyed by oil prices). A tremendously engaging book about a moment in history that we never hear about in the US due to our stupid jingoistic Cold War know-nothing attitude. There was also a Crooked Timber event on it, which I am going to read now.

The Girl of Fire and Thorns — an engaging fantasy set in pseudo-Iberian (or colonial South American?) land. A bit heavy on the religion aspect, but that’s the world she builds at least. Has some of the endemic race issues that much fantasy suffers from, but that might get mixed up in later books.

Re-imagining Milk (Andrea S. Wiley) — a short monograph on the history, biology, and anthropology of milk consumption. It’s full of interesting analysis and critique of how the purported health benefits of milk drinking after weaning intersect with policy. For example, why push milk consumption as a public health issue in parts of Africa when many/most people cannot process lactose? Why do we call the (normal) process in which the body stops producing lactase with the pejorative “lactose intolerance”? See also this piece by Mark Bittman on the NY Times site.

A Conspiracy of Paper (David Liss) — a mystery novel set before the South Sea Bubble and full of gritty and grimy London locales, brawling, the rampant anti-Semitism of the time, and some discussions of probability . The book grew out of dissertation research, so it’s meticulously researched, although some of the probability discussion seems ahistorical (c.f. Ian Hacking).

Next Page »