ISIT 2014: how many samples do we need?

Due to jetlag, my CAREER proposal deadline, and perhaps a bit of general laziness, I didn’t take as many notes at ISIT as I would have, so my posting will be somewhat light (in addition to being almost a month delayed). If someone else took notes on some talks and wants to guest-post on it, let me know!

Strong Large Deviations for Composite Hypothesis Testing
Yen-Wei Huang (Microsoft Corporation, USA); Pierre Moulin (University of Illinois at Urbana-Champaign, USA)
This talk was actually given by Vincent Tan since neither of the authors could make it (this seems to be a theme of talks I’ve attended this summer. The paper was about testing a simple hypothesis H_1 versus a composite hypothesis H_0 where under H_0 the observations are i.i.d. with respect to one of possibly k different distributions. There are therefore k different errors and the goal is to characterize these errors when we ask for the probability of true detection to be greater than 1 - \epsilon. This is a sort of generalized Neyman-Pearson setup. They look at the vector of log-likelihood ratios and show that a threshold test is nearly optimal. At the time, I understood the idea of the proof, but I think it’s one of things where you need to really read the paper.

Randomized Sketches of Convex Programs with Sharp Guarantees
Mert Pilanci (University of California, Berkeley, USA); Martin J. Wainwright (University of California, Berkeley, USA)
This talk was about using random projections to lower the complexity of solving a convex program. Suppose we want to minimize \| Ax - y \|^2 over x given y. A sketch would be to solve \| SAx - Sy \|^2 where S is a random projection. One question is how to choose A. They show that choosing S to be a randomized Hadamard matrix (the paper studies Gaussian matrices), then the objective value of the sketched program is at most (1 + \epsilon)^2 times the value of the original program as long as the the number of rows of S is larger than O( \epsilon^{-2} \mathbb{W}^2(A \mathcal{K})), where \mathbb{W}(A \mathcal{K}) is the Gaussian width of the tangent cone of the contraint set at the optimum value. For more details look at their preprint on ArXiV.

On Efficiency and Low Sample Complexity in Phase Retrieval
Youssef Mroueh (MIT-IIT, USA); Lorenzo Rosasco (DIBRIS, Unige and LCSL – MIT, IIT, USA)
This was another talk not given by the authors. The problem is recovery of a complex vector x_0 \in \mathbb{C}^n from phaseless measurements of the form b_i = |\langle a_i, x_0 \rangle|^2 where a_i are complex spherically symmetric Gaussian vectors. Recovery from such measurements is nonconvex and tricky, but an alternating minimizing algorithm can reach a local optimum, and if you start it in a “good” initial position, it will find a global optimum. The contribution of this paper is provide such a smart initialization. The idea is to “pair” the measurements to create new measurements y_i = \mathrm{sign}( b_i^{(1)} - b_i^{(2)} ). This leads to a new problem (with half as many measurements) which is still hard, so they find a convex relaxation of that. I had thought briefly about such sensing setups a long time ago (and by thought, I mean puzzled over it at a coffeshop once), so it was interesting to see what was known about the problem.

Sorting with adversarial comparators and application to density estimation
Jayadev Acharya (University of California, San Diego, USA); Ashkan Jafarpour (University of California, San Diego, USA); Alon Orlitsky (University of California, San Diego, USA); Ananda Theertha Suresh (University of California, San Diego, USA)
Ashkan gave this talk on a problem where you have m samples from an unknown distribution p and a set of distributions \{q_1, q_2, \ldots, q_n\} to compare against. You want to find the distribution that is closest in \ell_1. One way to do this is via Scheffe tournament tht compares all pairs of distributions — this runs in time n^2 time. They show a method that runs in O(n) time by studying the structure of the comparators used in the sorting method. The motivation is that running comparisons can be expensive (especially if they involve human decisions) so we want to minimize the number of comparisons. The paper is significantly different than the talk, but I think it would definitely be interesting to those interested in discrete algorithms. The density estimation problem is really just a motivator — the sorting problem is far more general.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s