ISIT 2015 : statistics and learning

The advantage of flying to Hong Kong from the US is that the jet lag was such that I was actually more or less awake in the mornings. I didn’t take such great notes during the plenaries, but they were rather enjoyable, and I hope that the video will be uploaded to the ITSOC website soon.

There were several talks on entropy estimation in various settings that I did not take great notes on, to wit:

  • DOES DIRICHLET PRIOR SMOOTHING SOLVE THE SHANNON ENTROPY ESTIMATION PROBLEM? (Yanjun Han, Tsinghua University, China; Jiantao Jiao, Tsachy Weissman, Stanford University, United States)
  • ADAPTIVE ESTIMATION OF SHANNON ENTROPY (Yanjun Han, Tsinghua University, China; Jiantao Jiao, Tsachy Weissman, Stanford University, United States)

I would highly recommend taking a look for those who are interested in this problem. In particular, it looks like we’re getting towards more efficient entropy estimators in difficult settings (online, large alphabet), which is pretty exciting.

Javad Heydari, Ali Tajer, Rensselaer Polytechnic Institute, United States
This talk was about hypothesis testing where the observer can control the samples being taken by traversing a graph. We have an n-node graph (c.f. a graphical model) representing the joint distribution on n variables. The data generated is i.i.d. across time according to either F_0 or F_1. At each time you get to observe the data from only one node of the graph. You can either observe the same node as before, explore by observing a different node, or make a decision about whether the data from from F_0 or F_1. By adopting some costs for different actions you can form a dynamic programming solution for the search strategy but it’s pretty heavy computationally. It turns out the optimal rule for switching has a two-threshold structure and can be quite a bit different than independent observations when the correlations are structured appropriately.

Yanting Ma, Dror Baron, North Carolina State University, United States; Ahmad Beirami, Duke University, United States
The mismatch studied in this paper is a mismatch in the prior distribution for a sparse observation problem y = Ax + \sigma_z z, where x \sim P (say a Bernoulli-Gaussian prior). The question is what happens when we do estimation assuming a different prior Q. The main result of the paper is an analysis of the excess MSE using a decoupling principle. Since I don’t really know anything about the replica method (except the name “replica method”), I had a little bit of a hard time following the talk as a non-expert, but thankfully there were a number of pictures and examples to help me follow along.

Yonatan Kaspi, University of California, San Diego, United States; Ofer Shayevitz, Tel-Aviv University, Israel; Tara Javidi, University of California, San Diego, United States
This was another search paper, but this time we have, say, K targets W_1, W_2, \ldots, W_K uniformly distributed in the unit interval, and what we can do is query at each time n a set S_n \subseteq [0,1] and get a response Y_n = X_n \oplus Z_n where X_n = \mathbf{1}( \exists W_k \in S_n ) and Z_n \sim \mathrm{Bern}( \mu(S_n) + b ) where \mu is the Lebesgue measure. So basically you can query a set and you get a noisy indicator of whether you hit any targets, where the noise depends on the size of the set you query. At some point \tau you stop and guess the target locations. You are (\epsilon,\delta) successful if the probability that you are within \delta of each target is less than \epsilon. The targeting rate is the limit of \log(1/\delta) / \mathbb{E}[\tau] as \epsilon,\delta \to 0 (I’m being fast and loose here). Clearly there are some connections to group testing and communication with feedback, etc. They show there is a significant gap between the adaptive and nonadaptive rate here, so you can find more targets if you can adapt your queries on the fly. However, since rate is defined for a fixed number of targets, we could ask how the gap varies with K. They show it shrinks.

Varun Jog, University of California, Berkeley, United States; Po-Ling Loh, University of Pennsylvania, United States
The graphical model for jointly Gaussian variables has no edge between nodes i and j if the corresponding entry (\Sigma^{-1})_{ij} = 0 in the inverse covariance matrix. They show a relationship between the KL divergence of two distributions and their corresponding graphs. The divergence is lower bounded by a constant if they differ in a single edge — this indicates that estimating the edge structure is important when estimating the distribution.

Aolin Xu, Maxim Raginsky, University of Illinois at Urbana–Champaign, United States
Max gave a nice talk on the problem of minimizing an expected loss \mathbb{E}[ \ell(W, \hat{W}) ] of a d-dimensional parameter W which is observed noisily by separate encoders. Think of a CEO-style problem where there is a conditional distribution P_{X|W} such that the observation at each node is a d \times n matrix whose columns are i.i.d. and where the j-th row is i.i.d. according to P_{X|W_j}. Each sensor gets independent observations from the same model and can compress its observations to b bits and sends it over independent channels to an estimator (so no MAC here). The main result is a lower bound on the expected loss as s function of the number of bits latex b, the mutual information between W and the final estimate \hat{W}. The key is to use the strong data processing inequality to handle the mutual information — the constants that make up the ratio between the mutual informations is important. I’m sure Max will blog more about the result so I’ll leave a full explanation to him (see what I did there?)

More on Shannon theory etc. later!


ISIT Deadline Extended to Monday

Apparently not everyone got this email, so here it is. I promise this blog will not become PSA-central.

Dear ISIT-2015-Submission Reviewers:

In an effort to ensure that each paper has an appropriate number of reviews, the deadline for the submission of all reviews has been extended to March 2nd. If you have not already done so, please submit your review by March 2nd as we are working to a very tight deadline.

In filling out your review, please keep in mind that

(a) all submissions are eligible to be considered for presentation in a semi-plenary session — Please ensure that your review provides an answer to Question 11
(b) in the case of a submission that is eligible for the 2015 IEEE Jack Keil Wolf ISIT Student Paper Award, the evaluation form contains a box at the top containing the text:
Notice: This paper is to be considered for the 2015 IEEE Jack Keil Wolf ISIT Student Paper Award, even if the manuscript itself does not contain a statement to that effect.
– Please ensure that your review provides an answer to Question 12 if this is the case.

Thanks very much for helping out with the review process for ISIT, your inputs are of critical importance in ensuring that the high standards of an ISIT conference are maintained. We know that reviewing a paper takes much effort and we are grateful for all the time you have put in!

With regards,

Pierre, Suhas and Vijay
(TPC Co-Chairs, ISIT 2015)