# ISIT Blogging, part 3

I’ll round out the end of my ISIT blogging with very brief takes on a few more papers. I took it pretty casually this year in terms of note taking, and while I attended many more talks, my notes for most of them consist of a title and a star next to the ones where I want to look at the paper more closely. That’s probably closer to how most people attend conferences, only they probably use the proceedings book. I actually ended up shredding the large book of abstracts to use as bedding for my vermicompost (I figured they might appreciate eating a little Turkish paper for a change of diet).

On Connectivity Thresholds in Superposition of Random Key Graphs on Random Geometric Graphs
B Santhana Krishnan (Indian Institute of Technology, Bombay, India); Ayalvadi Ganesh (University of Bristol, United Kingdom); D. Manjunath (IIT Bombay, India)
This looked at a model where you have a random geometric graph (RGG) together with a uniformly chosen random subset $S_i$ of $\{ 1, 2, \ldots, P_n\}$ of size $K_n$ at each node. The subset is the set of keys available at each node; two nodes can talk (securely) if they share a key in common. We keep the edge in the RGG is if the link can be secured. The question is whether the secure-link graph is connected. It turns out that the important scaling is in terms of $r_n^2 K_n^2/P_n$, where $r_n$ is the connectivity radius of the RGG. This sort of makes sense, as the threshold is more or less $\Theta(\log n/n)$, so the keys provide a kind of discount factor on effective radius needed for connectivity — if the number of keys per node is small then you need a larger radius to compensate.

Secure Network Coding for Distributed Secret Sharing with Low Communication Cost
Nihar B Shah (University of California at Berkeley, USA); K. v. Rashmi (University of California at Berkeley, USA); Kannan Ramchandran (University of California at Berkeley, USA)
This paper was on secret sharing — a dealer wants to distribute $n$ shares of a secret such that any $k$ of them can be used to reconstruct the secret but $k-1$ or fewer cannot. The idea here is that the dealer has to distribute these shares over the network, which means that if a receiver is not connected directly to the dealer then the share will be passed insecurely through another node. Existing approaches based on pairwise agreement protocols are communication intensive. The idea here is use ideas from network coding to share masked versions of shares so that intermediate nodes will not get valid shares from others. To do this the graph needs to satisfy a particular condition ($k$-propagating), which is defined in the paper. A neat take on the problem, and worth looking at if you’re interested in that sort of thing.

Conditional Equivalence of Random Systems and Indistinguishability Proofs
Ueli Maurer (ETH Zurich, Switzerland)
This was scheduled to be in the same session as my paper with Vinod, but was moved to an earlier session. Maurer’s “programme” as it were, is to think about security via three kinds of systems — real systems with real protocols and pseudorandomness, idealized systems with real protocols but real randomness, and perfect systems which just exist on paper. The first two are trivially indistinguishable from a computational perspective, and the goal is to show that the last two are information-theoretically indistinguishable. This conceptual framework is actually useful for me to separate out the CS and IT sides of the security design question. This paper tried to set up a framework in which there is a distinguisher D which tries to make queries to two systems and based on the answers has to decide if they are different or not. I think if you’re interested in sort of a systems-theoretic take on security you should take a look at this.

Tight Bounds for Universal Compression of Large Alphabets
Jayadev Acharya (University of California, San Diego, USA); Hirakendu Das (University of California San Diego, USA); Ashkan Jafarpour (UCSD, USA); Alon Orlitsky (University of California, San Diego, USA); Ananda Theertha Suresh (University of California, San Diego, USA)
The main contribution of this paper was to derive bounds on compression of patterns of sequences over unknown/large alphabets. The main result is that the worst case pattern redundancy for i.i.d. distributions is basically $n^{1/3}$ where $n$ is the blocklength. The main result is a new upper bound which uses some tricks like sampling a random number of points, where the number of samples is Poisson distributed, and a partition of the set of distributions induced by Poisson sampling.

To Surprise and Inform
Lav R. Varshney (IBM Thomas J. Watson Research Center, USA)
Lav talked about communication over a channel where the goal is to communicate subject to a constraint on the Bayesian surprise $s(x) = D( p(Y|x) \| P(Y) )$ where $X$ and $Y$ are the input and output of the channel. He gets a single-letter expression for the capacity under a bound on the max surprise and gives an example for which the same distribuion maximizes mutual information and achieves the minimax surprise. The flip side is to ask for capacity when each output should be surprising (or “attention seeking”). He gets a single letter capacity here as well, but the structure of the solution seems to be a bit more complicated.

# ISIT Blogging, part 2

Logarithmic Sobolev inequalities and strong data processing theorems for discrete channels
Maxim Raginsky (University of Illinois at Urbana-Champaign, USA)
Max talked about how the strong data processing inequality (DPI) is basically a log-Sobolev inequality (LSI) that is used in measure concentration. The strong DPI says that
$D(QW \| PW) \le \alpha D(Q \| P)$
for some $\alpha < 1$, so the idea is to get bounds on
$\delta^*(P,W) = \sup_{Q} \frac{D(QW \| PW)}{D(Q \| P)}$.
What he does is construct a hierarchy of LSIs in which the strong DPI fits and then gets bounds on this ratio in terms of best constants for LSIs. The details are a bit hairy, and besides, Max has his own blog so he can write more about it if he wants…

Building Consensus via Iterative Voting
Farzad Farnoud (University of Illinois, Urbana-Champaign, USA); Eitan Yaakobi (Caltech, USA); Behrouz Touri (University of Illinois Urbana-Champaign, USA); Olgica Milenkovic (University of Illinois, USA); Jehoshua Bruck (California Institute of Technology, USA)
This paper was about rank aggregation, or how to take a bunch of votes expressed as permutations/rankings of options to produce a final option. The model is one in which people iteratively change their ranking based on the current ranking. For example, one could construct the pairwise comparison graph (a la Condorcet) and then have people change their rankings when they disagree with the majority on an edge. They show conditions under which this process converges (the graph should not have a cycle) and show that if there is a Condorcet winner, then after this process everyone will rank the Condorcet winner first. They also look at a Borda count version of this problem but to my eye that just looked like an average consensus method, but it was at the end of the talk so I might have missed something.

Information-Theoretic Study of Voting Systems
Eitan Yaakobi (Caltech, USA); Michael Langberg (Open University of Israel, Israel); Jehoshua Bruck (California Institute of Technology, USA)
Eitan gave this talk and the preceding talk. This one was about looking at voting through the lens of coding theory. The main issue is this — what sets of votes or distribution of vote profiles will lead to a Condorcet winner? Given a set of votes, one could look at the fraction of candidates who rank candidate j in the i-th position and then try to compute entropies of the resulting distributions. The idea is somehow to characterize the existence or lack of a Condorcet winner in terms of distances (Kendall tau) and these entropy measures. This is different than looking at probability distributions on permutations and asking about the probability of there existing a Condorcet cycle.

Brute force searching, the typical set and Guesswork
Mark Chirstiansen (National University of Ireland Maynooth, Ireland); Ken R Duffy (National University of Ireland Maynooth, Ireland); Flávio du Pin Calmon (Massachusetts Institute of Technology, USA); Muriel Médard (MIT, USA)
Suppose an item $X$ is chosen $\sim P$ from a list and we want to guess the choice that is made. We’re only allowed to ask questions of the form “is the item $Y$?” Suppose now that the list is a list of codewords of blocklength $k$ drawn i.i.d. according to $Q$. This paper looks at the number of guesses one needs if $P$ is uniform on the typical set according to $Q$ versus when $P$ is distributed according the distribution $Q^k$ conditioned on $X$ being in the typical set. The non-uniformity of the latter turns out to make the guessing problem a lot easier.

Rumor Source Detection under Probabilistic Sampling
Nikhil Karamchandani (University of California Los Angeles, USA); Massimo Franceschetti (University of California at San Diego, USA)
This paper looked at an SI model of infection on a graph — nodes are either Susceptible (S) or Infected (I), and there is a probability of transitioning from S to I based on your neighbors’ states. There in exponential waiting time $\tau_{ij}$ for the $i$ to infect $j$ if $i$ is infected. The idea is that the rumor starts somewhere and infects a bunch of people and then you get to observe/measure the network. You want to find the source. This was studied by Zaman and Shah under the assumption of perfect observation of all nodes. This work looked at the case where nodes randomly report their infection state, so you only get an incomplete picture of the infection state. They characterize the effect of the reporting probability on the excess error and show that for certain tree graphs, incomplete reporting is as good as full reporting.

/

# UCI Repository Citation Change

When making an editing pass over a bibliography today, I noticed that the citation for the UC Irvine Machine Learning Repository has changed. It used to be
 @misc{Bache+Lichman:2013 , author = "A. Asuncion and D.H. Newman", year = "2007", title = "{UCI} Machine Learning Repository", url = "http://archive.ics.uci.edu/ml", institution = "University of California, Irvine, School of Information and Computer Sciences" } 
But now it’s this:
 @misc{Bache+Lichman:2013 , author = "K. Bache and M. Lichman", year = "2013", title = "{UCI} Machine Learning Repository", url = "http://archive.ics.uci.edu/ml", institution = "University of California, Irvine, School of Information and Computer Sciences" } 
Also, the KDD repository has been merged in with the main repository, so the above is now the citation for both.

Update your BibTeX accordingly! (You too Kunal, but I bet you don’t cite this repo that much).

# ISIT Plenary Lectures on ITSOC website

I added the slides from the ISIT plenaries to the ITSOC website. We’re still not sure when the video will arrive — figuring out media storage is one of the goals of the online committee (as reported to me by Matthieu.

# ISIT Blogging, part 1

Here are my much-belated post-ISIT notes. I didn’t do as good a job of taking notes this year, so my points may be a bit cursory. Also, the offer for guest posts is still open! On a related note the slides from the plenary lectures are now available on Dropbox, and are also linked to from the ISIT website.

From compression to compressed sensing
Shirin Jalali (New York University, USA); Arian Maleki (Rice University, USA)
The title says it, mostly. Both data compression and compressed sensing use special structure in the signal to achieve a reduction in storage, but while all signals can be compressed (in a sense), not all signals can be compressively sensed. Can one get a characterization (with an algorithm) that can take a lossy source code/compression method, and use it to recover a signal via compressed sensing? They propose an algorithm called compressible signal pursuit to do that. The full version of the paper is on ArXiV.

Dynamic Joint Source-Channel Coding with Feedback
Tara Javidi (UCSD, USA); Andrea Goldsmith (Stanford University, USA)
This is a JSSC problem with a Markov source, which can be used to model a large range of problems, including some sequential search and learning problems (hence the importance of feedback). The main idea is to map the problem in to a partially-observable Markov decision problem (POMDP) and exploit the structure of the resulting dynamic program. They get some structural properties of the solution (e.g. what are the sufficient statistics), but there are a lot of interesting further questions to investigate. I usually have a hard time seeing the difference between finite and infinite horizon formulations, but here the difference was somehow easier for me to understand — in the infinite horizon case, however, the solution is somewhat difficult to compute.

Unsupervised Learning and Universal Communication
Vinith Misra (Stanford University, USA); Tsachy Weissman (Stanford University, USA)
This paper was about universal decoding, sort of. THe idea is that the decoder doesn’t know the codebook but it knows the encoder is using a random block code. However, it doesn’t know the rate, even. The question is really what can one say in this setting? For example, symmetry dictates that the actual message label will be impossible to determine, so the error criterion has to be adjusted accordingly. The decoding strategy that they propose is a partition of the output space (or “clustering”) followed by a labeling. They claim this is a model for clustering through an information theoretic lens, but since the number of clusters is exponential in the dimension of the space, I think that it’s perhaps more of a special case of clustering. A key concept in their development is something they call the minimum partition information, which takes the place of the maximum mutual information (MMI) used in universal decoding (c.f. Csiszár and Körner).

Farzin Haddadpour (Sharif University of Technology, Iran); Mahdi Jafari Siavoshani (The Chinese University of Hong Kong, Hong Kong); Mayank Bakshi (The Chinese University of Hong Kong, Hong Kong); Sidharth Jaggi (Chinese University of Hong Kong, Hong Kong)
Of course I had to go to this paper, since it was on AVCs. The main result is that if one considers maximal error but allow the encoder only to randomize, then one can achieve the same rates over the Gaussian AVC as one can with average error and no randomization. That is, allowing encoder randomization can move from average error to max error. An analogous result for discrete channels is in a classic paper by Csiszár and Narayan, and this is the Gaussian analogue. The proof uses a similar quantization/epsilon-net plus union bound that I used in my first ISIT paper (also on Gaussian AVCs, and finally on ArXiV), but it seems that the amount of encoder randomization needed here is more than the amount of common randomness used in my paper.

Coding with Encoding Uncertainty
Jad Hachem (University of California, Los Angeles, USA); I-Hsiang Wang (EPFL, Switzerland); Christina Fragouli (EPFL, Switzerland); Suhas Diggavi (University of California Los Angeles, USA)
This paper was on graph-based codes where the encoder makes errors, but the channel is ideal and the decoder makes no errors. That is, given a generator matrix $G$ for a code, the encoder wiring could be messed up and bits could be flipped or erased when parities are being computed. The resulting error model can’t just be folded into the channel. Furthermore, a small amount of error in the encoder (in just the right place) could be catastrophic. They focus just on edge erasures in this problem and derive a new distance metric between codewords that helps them characterize the maximum number of erasures that an encoder can tolerate. They also look at a random erasure model.

/

David McAllester, my department chair at TTI, has a started a new blog.

I thought it was pretty well known that people are fairly unique by ZIP code, but Forbes has an article about it now (h/t Raj). Of course, stores can also ping a smartphone’s WiFi to get more accurate location information about your activity within the store — when you check out they can tag your the MAC address of your device to all the other information about you. Creeptastic!

Bradley Efron’s perspective on the impact of Bayes’ Theorem from Science (h/t Kevin).

Some discussion on what makes a popular philosophy book. I wonder what, if anything, transfers over to a popular mathematical book?

Some thoughts from Larry Laudan on the mathematization of the presumption of innocence.

/

# De-anonymized reviewers in discussion

One big difference between reviewing for conferences like NIPS/ICML and ISIT is that there is a “discussion” period between the reviewers and the Area Chair. These discussions are not anonymized, so you know who the other reviewers are and you can also read their reviews. This leads to a little privacy problem — A and B may be reviewing the same paper P, but A may be an author on a paper Q which is also being reviewed by B. Because A will have access to the text of B’s reviews on P and Q, they can (often) unmask B’s authorship of the review on Q simply by looking at the formatting of the reviews (are bullet points dashes or asterisks, do they give numbered points, are there “sections” to the review, etc). This seems to violate the spirit of anonymous review, which is perhaps why some have suggested that reviewing be unblinded (at least after acceptance).

The extent to which all of this matter is of course a product of the how fast the machine learning literature has grown and the highly competitive nature of the “top tier conferences.” Because the acceptance rate is so low, the reviewing process can appear to be “arbitrary” (read: subjective) and so questions of both review quality and author/review anonymity impact possible biases. However, if aim of double-blind reviewing is to reduce bias, then shouldn’t the discussions also be anonymized?

# a cute paper: finding a most biased coin with fewest flips

I saw this paper on ArXiV a while back and figured it would be a fun read, and it was. Post-ISIT blogging may have to wait for another day or two.

Finding a most biased coin with fewest flips
Karthekeyan Chandrasekaran, Richard Karp
arXiv:1202.3639 [cs.DS]

The setup of the problem is that you have $n$ coins with biases $\{p_i : i \in [n]\}$. For some given $p \in [\epsilon,1-\epsilon]$ and $\epsilon \in (0,1/2)$, each coin is “heavy” ($p_i = p + \epsilon$) with probability $\alpha$ and “light” ($p_i = p - \epsilon$) with probability $1 - \alpha$. The goal is to use a sequential flipping strategy to find a heavy coin with probability at least $1 - \delta$.

Any such procedure has three components, really. First, you have to keep track of some statistics for each coin $i$. On the basis of that, you need a rule to pick which coin to flip. Finally, you need a stopping criterion.

The algorithm they propose is a simple likelihood-based scheme. If I have flipped a particular coin $i$ a bunch of times and gotten $h_i$ heads and $t_i$ tails, then the likelihood ratio is
$L_i = \left( \frac{p+\epsilon}{p - \epsilon} \right)^{h_i} \left( \frac{ 1 - p - \epsilon }{ 1 -p + \epsilon} \right)^{t_i}$
So what the algorithm does is keep track of these likelihoods for the coins that it has flipped so far. But what coin to pick? It is greedy and chooses a coin $i$ which has the largest likelihood $L_i$ so far (breaking ties arbitrarily).

Note that up to now the prior probability $\alpha$ of a coin being heavy has not been used at all, nor has the failure probability $\delta$. These appear in the stopping criterion. The algorithm keeps flipping coins until there exists at least one $i$ for which
$L_i \ge \frac{1 - \alpha}{\alpha} \cdot \frac{ 1 - \delta }{\delta}$
It then outputs the coin with the largest likelihood. It’s a pretty quick calculation to see that given $(h_i, t_i)$ heads and tails for a coin $i$,
$\mathbb{P}(\mathrm{coin\ }i\mathrm{\ is\ heavy}) = \frac{\alpha L_i}{ \alpha L_i + (1 - \alpha) }$,
from which the threshold condition follows.

This is a simple-sounding procedure, but to analyze it they make a connection to something called a “multitoken Markov game” which models the corresponding mutli-armed bandit problem. What they show is that for the simpler case given by this problem, the corresponding algorithm is, in fact optimal in the sense that it makes the minimum expected number of flips:
$\frac{16}{\epsilon^2} \left( \frac{1 - \alpha}{\alpha} + \log\left( \frac{(1 -\alpha)(1 - \delta)}{\alpha \delta} \right) \right)$

The interesting thing here is that the prior distribution on the heavy/lightness plays a pretty crucial role here in designing the algorithm. part of the explore-exploit tradeoff in bandit problems is the issue of hedging against uncertainty in the distribution of payoffs — if instead you have a good handle on what to expect in terms of how the payoffs of the arms should vary, you get a much more tractable problem.

/

# ISIT highlights request

ISIT 2013 is over, and there were lots of good talks. Just as a few quick impressions — the conference felt a bit more fragmented to me, topic-wise. I almost didn’t see many of the coding theorists because their sessions were physically separated from the Shannon theory/estimation sessions that I normally find myself in.

However, perhaps the theme of the conference for me this year was “measure concentration.” Between Marton’s Shannon Lecture, Lugosi’s tutorial, and Max’s talk on data processing, I felt like very measured, and yet full of concentration.

I’ll blog about a few of them here, but if any readers want to write a guest post about some interesting talks they saw, please feel free to contact me. I’ll likely post some recaps later in the week after I get back from Asia/Europe.