I’m on the program committee for the Cyber-Security and Privacy symposium, so I figured I would post this here to make more work for myself.

GlobalSIP 2013 – Call for Papers
IEEE Global Conference on Signal and Information Processing
December 3-5, 2013 | Austin, Texas, U.S.A.

GlobalSIP: IEEE Global Conference on Signal and Information Processing is a new flagship IEEE Signal Processing Society conference. The focus of this conference is on signal and information processing and up-and-coming signal processing themes.

GlobalSIP is composed of symposia selected based on responses to the call-for-symposia proposals. GlobalSIP is composed of symposia on hot topics related to signal and information processing.

The selected symposia are:

Paper submission will be online only through the GlobalSIP 2013 website Papers should be in IEEE two-column format. The maximum length varies among the symposia; be sure to check each symposium’s information page for details. Authors of Signal Processing Letters papers will be given the opportunity to present their work at GlobalSIP 2013, subject to space availability and approval by the Technical Program Chairs of GlobalSIP 2013. The authors need to specify in which symposium they wish to present their paper. Please check conference webpage for details.

Important Dates:
*New* Paper Submission Deadline – June 15, 2013
Review Results Announce – July 30, 2013
Camera-Ready Papers Due – September 7, 2013
*New* SPL request for presentation – September 7, 2013

I’m at the Bellairs Research Institute for a workshop this week and I’ll blog a bit later about some of the interesting talks here. We give the talks on the balcony of one of the buildings, projected on the wall. Fortunately, we are facing west, which means talks have to end at around 2:30 before people start baking to death. After all that superheated research the only thing to do, really, is cool off in the ocean next door…

The beach at Belairs

Again a caveat — these are the talks in which I took reasonable enough notes to write anything coherent.

Green Communication: From Maxwell’s Demon to “Informational Friction”
Pulkit Grover
Pulkit talked about trying to tie a physical interpretation the energy used in communication during computation. Physicists might argue that reversible computation costs nothing, but this ignores friction and noise. Pulkit discussed a simple network model to account for “informational friction” that penalizes the bit-distance product in communicating on a chip. See also Pulkit’s short video on the topic.

Hajar Mahdavi-Doost, Roy Yates
Roy talked about a model in which receivers have to harvest the energy they need for sampling/buffering/decoding the transmissions. These three tasks cost different amounts, and in particular, the rate at which the receiver samples the output dictates the other parameters. The goal is to choose a rate which helps meet the decoder energy requirements. Because the receiver has to harvest the energy it needs, it has to design a policy to switch between the three operations while harvesting the (time-varying) energy available to it.

Multiple Access and Two-way Channels with Energy Harvesting and Bidirectional Energy Cooperation
Kaya Tutuncuoglu Aylin Yener
Unlike the previous talk, this was about encoders which have to transmit energy to the receivers — there’s a tradeoff between transmitting data and energy, and in the MAC and TWC there is yet another dimension in how the two users can cooperate. For eample, they can cooperate in energy transmission but not data cooperation. There were a lot of results in here, but there was also a discussion of policies for the users. In particular a “procrastination” strategy turns out to work well (rejoice!).

An equivalence between network coding and index coding
Michelle Effros, Salim El Rouayheb, Michael Langberg
The title says it all! For every network coding problem (multiple unicast, multicast, whatever), there exists a corresponding index coding problem (constructed via a reduction) such that a solution to the latter can be easily translated to a solution for the former. This equivalence holds for all network coding problems, not just linear ones.

Crowd-sourcing epidemic detection
Constantine Caramanis, Chris Milling, Shie Mannor, Sanjay Shakkottai
Suppose we have a graph and we can see some nodes are infected. This paper was on trying to distinguish between whether the infected nodes started from a single point infection spread via an SI model, or just from a random pattern of infection. They provide two algorithms for doing this and then address how to deal with false positives using ideas from robust statistics.

I promised some ITA blogging, so here it is. Maybe Alex will blog a bit too. These notes will by necessity be cursory, but I hope some people will find some of these papers interesting enough to follow up on them.

A Reverse Pinsker Inequality
Daniel Berend, Peter Harremoës , Aryeh Kontorovich
Aryeh gave this talk on what we can say about bounds in the reverse direction of Pinsker’s inequality. Of course, in general you can’t say much, but what they do is show an expansion of the KL divergence in terms of the total variation distance in terms of the balance coefficient of the distribution $\beta = \inf \{ P(A) : P(A) \ge 1/2 \}$.

Unfolding the entropy power inequality
Mokshay gave a talk on the entropy power inequality. Given vector random variables $X_1$ and $X_2$ is there a term we know that $h(X_1 + X_2) \ge h(Z_1 + Z_2)$ where $Z_1$ and $Z_2$ are isotropic Gaussian vectors with the same differential entropy as $X_1$ and $X_2$. The question in this paper is this : can we insert a term between these two in the inequality? The answer is yes! They define a spherical rearrangement of the densities of $X_1$ and $X_2$ into variables $X_1^{\ast}$ and $X_2^{\ast}$ with spherically symmetric decreasing densities and show that the differential entropy of their sum lies between the two terms in the regular EPI.

Improved lower bounds on the total variation distance and relative entropy for the Poisson approximation
Igal Sason
The previous lower bounds mentioned in the title were based on the Chen-Stein method, and they can be strengthened by sharpening the analysis in the Chen-Stein method.

Fundamental limits of caching
This talk was on tradeoffs in caching. If there are $N$ files, $K$ users and a size $M$ cache at each user, how should they cache files so as to best allow a broadcaster to share the bandwidth to them? More simply, suppose there are three people who may want to watch one of three different TV shows, and they can buffer the content of one TV show. Since a priori you don’t know which show they want to watch, the idea might be to buffer/cache the first 3rd of each show at each user. They show that this is highly suboptimal. Because the content provider can XOR parts of the content to each user, the caching strategy should not be the same at each user, and the real benefit is the global cache size.

Simple outer bounds for multiterminal source coding
This was a very cute result on using the HGR maximal correlation to get outer bounds for multiterminal source coding without first deriving a single letterization of the outer bound. The main ideas are to use two properties of the HGR correlation : it tensorizes (to get the multiletter part) and the strong DPI from Elza Erkip and Tom Cover’s paper (referenced above).

As I’ve gotten farther along in this whole research career, I’ve found it more and more difficult to figure out the optimal way to balance the different things one does at a conference :

• Going to talks. This is ostensibly the point of the conference. It’s impossible to read all of the papers that are out there and a talk is a fast way to get the gist of a bunch of papers or learn about a new problem in less time than it takes to really read and digest the paper. We’re social creatures so it’s more natural to get information this way.
• Meeting collaborators to talk about research problems. I have lots of collaborators who are outside TTI and a conference is a good chance to catch up with them face-to-face, actually sit down and hammer out some details of a problem, or work on a new problem with a (potential) new collaborator. Time sitting over a notepad is time not spent in talks, though.
• Professional networking. I’m on the job market now, and it’s important to at least chat casually with people about your research, what you think is exciting your future plans, and the like. This is sometimes the “real” point of conferences.
• Social networking. Sometimes conferences are the only times I get to see my friends from grad school, and in a sense your professional peers are the only people who “get” your crazy obsession with esoteric problem $P$ and like to get a beer with you.

So the question for the readership : how do you decide the right balance for yourself? Do you go in with a plan to see at least N talks or a certain set $S$ of talks, or are you open to just huddling in the corner with a notepad?

I wrote this post in an attempt to procrastinate about ITA blogging, which I will get to in a bit. I went to far fewer talks than I expected to this year, but I’ll write about ‘em later.

If you, like me, tend to cart around old ISIT papers and just gut them to put in the new content for this year’s paper, don’t do it. Instead, download the template because the page size has changed from letter to a4.

Also, as a postscript to Sergio’s note that eqnarray is OVER, apparently Stefan recommends we use IEEEeqnarray instead of align.

A few weeks (!) ago I was talking with an anthropologist friend of mine about how different fields have different modes of communicating research “findings” in the conference setting. Some places people just read their paper out loud, others have slide presentations, yet others have posters, and I imagine some people do blackboard talks. Of course, conferences have many purposes — schmoozing, job hunting, academic political wrangling, and so on. What is unclear to me is why particular academic communities have settled on particular norms for presenting their work.

One axis along which to understand this might be the degree to which the presentation of the paper is an advertisement for the written paper. In many humanities conferences, people simply read their paper out loud. You’d think that theater researchers would be able to make a more… dramatic reading of their work, but you’d be wrong much of the time. It’s very hard to sit and listen and follow a jargon-heavy analysis of something that you probably have never read about (e.g. turn of the century commercial theater in Prague), and in some sense I feel that the talk as an advertisement for the paper is minimal here.

On the other hand, a poster session maximizes the “advertisement of the paper” aspect. People stand there for 5 minutes while you explain the ideas in the paper, and if seems sufficiently interesting then they will go and read the actual paper. A difference here between the model in the humanities is that there is a paper in the proceedings, while in humanities conferences this is not necessarily the case.

Slide presentations are somewhere in the middle — I often go to a talk at a conference and think “well, now I don’t need to read the paper.” These are the trickiest because the audience is captive but you cannot give them the full story. It’s more of a method for luring already-interested people into wanting to read the paper rather than the browsing model of a poster session.

However, even this “advertisement” categorization raises the question of why we have poster sessions, slide presentations, and paper readings. Are these the best way to present the research in those fields? Should we have more posters at ISIT and fewer talks (more like NIPS)? Should NIPS have more parallel sessions to reflect the spread of interest in the “community?” Should anthropology conferences have each panelist give an 8 minute slide presentation followed by real discussion?

I missed ITW in Lausanne this year, but I heard that they mixed up the format to great success. More posters and fewer talks meant more interaction and more discussion. I think more experimenting could be good — maybe some talks should be given as chalk talks with no slides!

I took it a bit easy today at the conference and managed to spend some time talking to collaborators about work, so perhaps I wasn’t as 100% all in to the talks and posters. In general I find that it’s hard to understand for many posters what the motivating problem is — it’s not clear from the poster, and it’s not always clear from the explanation. Here were a few papers which I thought were interesting:

W. Koolen, D. Adamskiy, M. Warmuth
Putting Bayes to sleep
Some signals look sort of jump Markov — the distribution of the data changes over time so that there are segments which have distribution A, then later it switches to B, then perhaps back to A, and so on. A prediction procedure which “mixes past posteriors” works well in this setting but it was not clear why. This paper provides a Bayesian interpretation for the predictor as mixing in a “sleeping experts” setting.

J. Duchi, M. Jordan, M. Wainwright, A. Wibisono
Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods
This paper looked at stochastic gradient descent when function evaluations are cheap but gradient evaluations are expensive. The idea is to compute an unbiased approximation to the gradient by evaluating the function at the $\theta_t$ and $\theta_t + \mathrm{noise}$ and then do the discrete approximate to the gradient. Some of the attendees claimed this is similar to an approach proposed by Nesterov, but the distinction was unclear to me.

J. Lloyd, D. Roy, P. Orbanz, Z. Ghahramani
Random function priors for exchangeable graphs and arrays
This paper looked at Bayesian modeling for structures like undirected graphs which may represent interactions, like protein-protein interactions. Infinite random graphs whose distributions are invariant under permutations of the vertex set can be associated to a structure called a graphon. Here they put a prior on graphons, namely a Gaussian process prior, and then try to do inference on real graphs to estimate the kernel function of the process, for example.

N. Le Roux, M. Schmidt, F. Bach
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets
This was a paper marked for oral presentation — the idea is that in gradient descent it is expensive to evaluate gradients if your objective function looks like $\sum_{i=1}^{n} f(\theta, x_i)$, where $x_i$ are your data points and $n$ is huge. This is because you have to evaluate $n$ gradients. On the other hand, stochastic gradient descent can be slow because it picks a single $i$ and does a gradient step at each iteration on $f(\theta_t, x_i)$. Here what they do at step $t$ is pick a random point $j$, evaluate its gradient, but then take a gradient step on all $n$ points. For points $i \ne j$ they just use the gradient from the last time $i$ was picked. Let $T_i(t)$ be the last time $i$ was picked before time $t$, and $T_j(t) = t$. Then they take a gradient step like $\sum_{i = 1}^{n} f(\theta_{T_i(t)}, x_i)$. This works surprisingly well.

Stephane Mallat
Classification with Deep Invariant Scattering Networks
This was an invited talk — Mallat was trying to explain why deep networks seem to do learning well (it all seems a bit like black magic), but his explanation felt a bit heuristic to me in the end. The first main point he had is that wavelets are good at capturing geometric structure like translation and rotation, and appear to have favorable properties with respect to “distortions” in the signal. The notion of distortion is a little vague, but the idea is that if two signals (say images) are similar but one is slightly distorted, they should map to representations which are close to each other. The mathematics behind his analysis framework was group theoretic — he wants to estimate the group of actions which manipulate images. In a sense, this is a control-theory view of the problem (at least it seemed to me). The second point that I understood was that sparsity in representation has a big role to play in building efficient and layered representations. I think I’d have to see the talk again to understand it better, but in the end I wasn’t sure that I understood why deep networks are good, but I did understand some more interesting things about wavelet representations, which is cool.

I am attending NIPS this year for the first time, and so I figured it would be good to blog about some of it here. I totally dropped the ball on Allerton, so maybe I’ll make up for it by writing more about the actual talks here. Fortunately, or unfortunately, most of the conference is about things I have almost no experience with, so I am having a bit of an explore/exploit tradeoff in my selection process.

Every day of the conference has a poster session from 7 to midnight — there are 90+ posters in a single room and people go in and out of hanging out with friends and looking at posters. My poster (a paper with Kamalika Chaudhuri and Kaushik Sinha on differentially private approximations to PCA) was last night, so I was on the presenting end of things. I gave up at 10:30 because I was getting hoarse and tired, but even then there were a fair number of people milling about. Since I was (mostly) at my poster I missed out on the other works.

During the day the conference is a single-track affair with invited and highlighted talks. There are two kinds of highlighted talks — some papers are marked for oral prevention (ETA: presentation), and some are marked as “spotlights,” which means that the authors get to make a 5 minute elevator pitch for their poster in front of the whole conference. Those start today, and I’m looking forward to it.

In the meantime, here is a picture from the hike I took yesterday with Erin:

Mountain Range on a hike near Lake Lily.

I’m headed to NIPS next week. Via my (soon to be ex-) colleague Dhruv Batra comes this fun visualization by Andrej Karpathy of the topics of papers, clustered by an LDA model.