HGR maximal correlation revisited : a corrected reverse inequality

Sudeep Kamath sent me a note about a recent result he posted on the ArXiV that relates to an earlier post of mine on the HGR maximal correlation and an inequality by Erkip and Cover for Markov chains U -- X -- Y which I had found interesting:
I(U ; Y) \le \rho_m(X,Y)^2 I(U ; X).
Since learning about this inequality, I’ve seen a few talks which have used the inequality in their proofs, at Allerton in 2011 and at ITA this year. Unfortunately, the stated inequality is not correct!

On Maximal Correlation, Hypercontractivity, and the Data Processing Inequality studied by Erkip and Cover
Venkat Anantharam, Amin Gohari, Sudeep Kamath, Chandra Nair

What this paper shows is that the inequality is not satisfied with \rho_m(X,Y)^2, but by another quantity:
I(U ; Y) \le s^*(X;Y) I(U ; X)
where s^*(X;Y) is given by the following definition.

Let X and Y be random variables with joint distribution (X, Y) \sim p(x, y). We define
s^*(X;Y) = \sup_{r(x) \ne p(x)} \frac{ D( r(y) \| p(y) ) }{ D( r(x) \| p(x) },
where r(y) denotes the y-marginal distribution of r(x, y) := r(x)p(y|x) and the supremum on the right hand side is over all probability distributions r(x) that are different from the probability distribution p(x). If either X or Y is a constant, we define s^*(X; Y) to be 0.

Suppose (X,Y) have joint distribution P_{XY} (I know I am changing notation here but it’s easier to explain). The key to showing their result is through deriving variational characterizations of \rho_m and s^* in terms of the function
t_{\lambda}( P_X ) := H( P_Y ) - \lambda H( P_X )
Rather than getting into that in the blog post, I recommend reading the paper.

The implication of this result is that the inequality of Erkip and Cover is not correct : not only is \rho_m(X,Y)^2 not the supremum of the ratio, there are distributions for which it is not an upper bound. The counterexample in the paper is the following: X \sim \mathsf{Bernoulli}(1/2), and Y is generated via this asymmetric erasure channel:

Joint distribution counterexample

Joint distribution counterexample (Fig. 2 of the paper)

How can we calculate \rho_m(X,Y)? If either X or Y is binary-valued, then
\rho_m(X,Y)^2 = -1 + \sum_{x,y} \frac{ p(x,y)^2 }{ p(x) p(y) }
So for this example \rho_m(X,Y)^2 = 0.6. However, s^*( X,Y) = \frac{1}{2} \log_2(12/5) > 0.6 and there exists a sequence of variables U_i satisfying the Markov chain such that U_i -- X -- Y such that the ratio approaches s^*.

So where is the error in the original proof? Anantharam et al. point to an explanation that the Taylor series expansion used in the proof of the inequality with \rho_m(X,Y)^2 may not be valid at all points.

This seems to just be the start of a longer story, which I look forward to reading in the future!


2 thoughts on “HGR maximal correlation revisited : a corrected reverse inequality

  1. This is very interesting. I guess it remains to be seen if the results where this inequality was used can be salvaged.
    Some typos: 1. The corrected statement of the result should be an inequality.
    2. The word “latex” appears in the quote.

    • Thanks! I was in a bit of a rush when I wrote this so I didn’t get a chance to catch the typos — the way wordpress works is you have to type “latex” before any LaTeX commands, which results in the issues. Thanks!

Comments are closed.