### Uncategorized

I always end up bookmarking a bunch of papers from ArXiV and then looking at them a bit later than I want. So here are a few notes on some papers from the last month. I have a backlog of reading to catch up on, so I’ll probably split this into a couple of posts.

arXiv:1403.3465v1 [cs.LG]: Analysis Techniques for Adaptive Online Learning
H. Brendan McMahan
This is a nice survey on online learning/optimization algorithms that adapt to the data. These are all variants of the Follow-The-Regularized-Leader algorithms. The goal is to provide a more unified analysis of online algorithms where the regularization is data dependent. The intuition (as I see it) is that you’re doing a kind of online covariance estimation and then regularizing with respect to the distribution as you are learning it. Examples include the McMahan and Streeter (2010) paper and the Duchi et al. (2011) paper. Such adaptive regularizers also appear in dual averaging methods, where they are called “prox-functions.” This is a useful survey, especially if, like me, you’ve kind of checked in and out with the online learning literature and so may be missing the forest for the trees. Or is that the FoReL for the trees?

arXiv:1403.4011 [cs.IT]: Whose Opinion to follow in Multihypothesis Social Learning? A Large Deviation Perspective
Wee Peng Tay
This is a sort of learning from expert advice problem, though not in the setting that machine learners would consider it. The more control-oriented folks would recognize it as a multiple-hypothesis test. The model is that there is a single agent (agent $0$) and $K$ experts (agents $1, 2, \ldots, K$). The agent is trying to do an $M$-ary hypothesis test. The experts (and the agent) have access to local (private) observations $Y_k[1], Y_k[2], \ldots, Y_k[n_k]$ for $k \in \{0,1,2,\ldots,K\}$. The observations come from a family of distributions determined by the true hypothesis $m$. The agent $0$ needs to pick one of the $K$ experts to hire — the analogy is that you are an investor picking an analyst to hire. Each expert has its own local loss function $C_k$ which is a function of the amount of data it has as well as the true hypothesis and the decision it makes. This is supposed to model a “bias” for the expert — for example, they may not care to distinguish between two hypotheses. The rest of the paper looks at finding policies/decision rules for the agents that optimize the exponents with respect to their local loss functions, and then looking at how agent $0$ should act to incorporate that advice. This paper is a little out of my wheelhouse, but it seemed interesting enough to take a look at. In particular, it might be interesting to some readers out there.

arXiv:1403.3862 [math.OC] Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties
Ji Liu, Stephen J. Wright
This is another paper on lock-free optimization (c.f. HOGWILD!). The key difference, as stated in the introduction, is that they “do not assume that the evaluation vector $\hat{x}$ is a version of $x$ that actually existed in the shared memory at some point in time.” What does this mean? It means that a local processor, when it reads the current state of the iterate, may be performing an update with respect to a point not on the sample path of the algorithm. They do assume that the delay between reading and updating the common state is bounded. To analyze this method they need to use a different analysis technique. The analysis is a bit involved and I’ll have to take a deeper look to understand it better, but from a birds-eye view this would make sense as long as the step size is chosen properly and the “hybrid” updates can be shown to be not too far from the original sample path. That’s the stochastic approximator in me talking though.

I meant to blog about this a while back, but somehow starting a new job/teaching are very time consuming (who knew?). Luckily, it’s about an older result of Banaszczyk (pronounced bah-nahsh-chik, I think):

Wojciech Banaszczyk. Balancing vectors and gaussian measures of n-dimensional convex bodies. Random Structures & Algorithms, 12(4):351–360, 1998.

This result came to my attention from a talk given by Sasho Nikolov here at Rutgers on his paper with Kunal Talwar on approximating hereditary discrepancy (see Kunal’s post from last year). The result is pretty straightforward to state.

Banaszczyk’s Theorem. There exists a universal constant $C$ such that the following holds. Let $A = [a_1\ a_2\ \cdots \ a_n]$ be an $m \times n$ real matrix such that the $i$-th column $a_i$ satisfies $\|a_i\|_2 \le 1$ for $i = 1, 2, \ldots, n$, and let $\mathcal{K}$ be a convex body in $\mathbb{R}^m$ such that $\mathbb{P}( G \in K ) \ge 1/2$, where $G \sim \mathcal{N}(0, I_m)$. Then there exists a vector $x \in \{-1,1\}^n$ such that $Ax \in C \cdot \mathcal{K}$.

This is a pretty cool result! Basically, if your convex body $\mathcal{K}$ is big enough to capture half of the probability of a standard Gaussian, then if you blow it up by $C$ to get $C \cdot \mathcal{K}$, then for any arbitrary collection of sub-unit-norm vectors $\{a_i\}$, I can find a way to add and subtract them from each other so that the result ends up in $C \cdot \mathcal{K}$.

I haven’t found a use for this result, but it’s a neat fact to keep in the bucket. Maybe it would be useful in alignment/beamforming schemes? Unfortunately, as far as I can tell he doesn’t tell you how to find this mysterious $x$, so…

For those readers of the blog who have not submitted papers to machine learning (or related) conferences, the conference review process is a bit like a mini-version of a journal review. You (as the author) get the reviews back and have to write a response and then the reviewers discuss the paper and (possibly, but in my experience rarely) revise their reviews. However, they generally are supposed to take into account the response in the discussion. In some cases people even adjust their scores; when I’ve been a reviewer I often adjust my scores, especially if the author response addresses my questions.

This morning I had the singular experience of having a paper rejected from ICML 2014 in which all of the reviewers specifically marked that they did not read and consider the response. Based on the initial scores the paper was borderline, so the rejection is not surprising. However, we really did try to address their criticisms in our rebuttal. In particular, some misunderstood what our claims were. Had they bothered to read our response (and proposed edits), perhaps they would have realized this.

Highly selective (computer science) conferences often tout their reviews as being just as good as a journal, but in both outcomes and process, it’s a pretty ludicrous claim. I know this post may sound like sour grapes, but it’s not about the outcome, it’s about the process. Why bother with the facade of inviting authors to rebut if the reviewers are unwilling to read the response?

S. Raj Rajagopalan and collaborators at Honeywell are doing some security research on making better passwords. They are looking for some people to do a quick study on password design.

Along with a couple of Honeywell security researchers I am running a study on a rather familiar problem for most of us – creating memorable but secure passwords, i.e. how to generate passwords that are both suitably random and memorable. We have just launched a simple user study that asks volunteers to participate in an interactive session that lets them choose password candidates and see how well they remember them. Needless to say, these are not actual passwords used by any system, only strings that could be used as passwords.

No personal information is collected in the study and the system only stores the data that is actually provided by the user. To that end, you may choose to not provide any bit of information as you choose. The study takes only a couple of minutes to finish. You may run it multiple times if you wish (and you will likely get different use cases) but you will have to clear the cache on your browsers to get a fresh configuration.

We need at least 300 participants to get statistical significance, so we would appreciate it if you could participate in the study.

Thanks for your help. Any questions on the study may be directed to me.

Raj

Via Cynthia, here is a column by James Mickens about how horrible the web is right now:

Computer scientists often look at Web pages in the same way that my friend looked at farms. People think that Web browsers are elegant computation platforms, and Web pages are light, fluffy things that you can edit in Notepad as you trade ironic comments with your friends in the coffee shop. Nothing could be further from the truth. A modern Web page is a catastrophe. It’s like a scene from one of those apocalyptic medieval paintings that depicts what would happen if Galactus arrived: people are tumbling into fiery crevasses and lamenting various lamentable things and hanging from playground equipment that would not pass OSHA safety checks.

It’s a fun read, but also a sentiment that may echo with those who truly believe in “clean slate networking.” I remember going to a tutorial on LTE and having a vision of what 6G systems will look like. One thing that is not present, though, is the sense that the system is unstable, and that the introduction of another feature in communication systems will cause the house of cards to collapse. Mickens seems to think the web is nearly there. The reason I thought of this is the recent fracas over the US ceding control of ICANN, and the sort of doomsdaying around that. From my perspective, network operators are sufficiently conservative that they can’t/won’t willy-nilly introduce new features that are only half-supported, like the in Web. The result is a (relatively) stable networking world that appears to detractors as somewhat Jurassic.

I’d argue (with less hyperbole) that some of our curriculum ideas also suffer from the accretion of old ideas. When I took DSP oh-so-long ago (13 years, really?) we learned all of this Direct Form Transposed II blah blah which I’m sure was useful for DSP engineers at TI to know at some point, but has no place in a curriculum now. And yet I imagine there are many places that still teaching it. If anyone reads this still, what are the dinosaurs in your curriculum?

1. Song For The Sold – Kishi Bashi
2. Calling – Snorri Helgason
3. Heart Beats – Hey Marseilles
4. We Get Along – Sharon Jones & The Dap-Kings
5. 1904 Twin Peaks – Alice Russel
6. We Don’t Sleep – Har Mar Superstar
7. Digital Witness – St. Vincent
8. Nothing You Could Say – Big Eyes
9. Jump And Shout – The Dirtbombs
10. Diversionary (Do the Right Thing) – Ages and Ages
11. Another Ten Reasons – Young Fresh Fellows
12. Where Can I Go? – Laura Marling

My college friend Ann Marie Thomas has a post up on the problematic use of the word “failure” in the discourse around education, technology, and design. I was listening to Dyson extoll the virtues of failure Science Friday recently, and it also made me cringe. Ann talks about the problems with “failure” from an education and design point of view, but I think it’s also problematic from teaching/training students to be researchers. One of the most normalizing things I heard in graduate school from my advisor was “well, that’s research for you” after I told him I had found a counterexample to everything I had “proved” in the previous week. I don’t think of that as “embracing failure” but rather a recognition that the process is not one of continuous forward progress.

The sound-bite nature of the word does a disservice to the valuable concept, which is, as Ann says, to “try something.” I think it’s not (often) true that students are afraid to try things because they are afraid to fail. It’s far more likely that they are unsure of how to try things, or what to try. The problem is too abstract and it’s hard to find any sort of inroad that might make sense. Or they have thought they have an inroad and it’s absolutely not working and they are frustrated because they can’t step back and say “this approach is bad.”

I can’t help but think that this talk of “failure” is somehow leaking in from positive psychology. I think it treats us like children who may be afraid to go down some stairs because they are too tall, or afraid to try the new food because it looks funny. It obscures the really difficult part, which is about where to start, not how you end.

Via Serdar Yüksel and the IT Society, the 2014 IEEE North American School on Information Theory will be held at the Fields Institute in Toronto this June. The lecturers for this school are:

My friend Ranjit is working on this Crash Course in Psychology. Since I’ve never taken psychology, I am learning a lot!

Apparently the solution for lax editorial standards is to scrub away the evidence. (via Kevin Chen).

Some thoughts on high performance computing vs. Map Reduce. I think about this a fair bit, since some of my colleagues work on HPC, which feels like a different beast than a lot of the problems I’ve been thinking about.

A nice behind-the-scenes on Co-Op Sauce, a staple at Chicagoland farmers’ markets.

Some of my office furniture is on backorder, like the standing desk unit and my actual desktop, but in the meantime I have found a use for the hardcopy IEEE Transactions that I’ve been carting around with me from job to job:

A way to use IEEE Transactions

Next Page »