27 | April | 2013 | An Ergodic Walk

Sudeep Kamath sent me a note about a recent result he posted on the ArXiV that relates to an earlier post of mine on the HGR maximal correlation and an inequality by Erkip and Cover for Markov chains $U -- X -- Y$ which I had found interesting:
$I(U ; Y) \le \rho_m(X,Y)^2 I(U ; X)$ .
Since learning about this inequality, I’ve seen a few talks which have used the inequality in their proofs, at Allerton in 2011 and at ITA this year. Unfortunately, the stated inequality is not correct!

On Maximal Correlation, Hypercontractivity, and the Data Processing Inequality studied by Erkip and Cover
Venkat Anantharam, Amin Gohari, Sudeep Kamath, Chandra Nair

What this paper shows is that the inequality is not satisfied with $\rho_m(X,Y)^2$ , but by another quantity:
$I(U ; Y) \le s^*(X;Y) I(U ; X)$
where $s^*(X;Y)$ is given by the following definition.

Let $X$ and $Y$ be random variables with joint distribution $(X, Y) \sim p(x, y)$ . We define
$s^*(X;Y) = \sup_{r(x) \ne p(x)} \frac{ D( r(y) \| p(y) ) }{ D( r(x) \| p(x) }$ ,
where $r(y)$ denotes the $y$ -marginal distribution of $r(x, y) := r(x)p(y|x)$ and the supremum on the right hand side is over all probability distributions $r(x)$ that are different from the probability distribution $p(x)$ . If either $X$ or $Y$ is a constant, we define $s^*(X; Y)$ to be 0.

Suppose $(X,Y)$ have joint distribution $P_{XY}$ (I know I am changing notation here but it’s easier to explain). The key to showing their result is through deriving variational characterizations of $\rho_m$ and $s^*$ in terms of the function
$t_{\lambda}( P_X ) := H( P_Y ) - \lambda H( P_X )$
Rather than getting into that in the blog post, I recommend reading the paper.

The implication of this result is that the inequality of Erkip and Cover is not correct : not only is $\rho_m(X,Y)^2$ not the supremum of the ratio, there are distributions for which it is not an upper bound. The counterexample in the paper is the following: $X \sim \mathsf{Bernoulli}(1/2)$ , and $Y$ is generated via this asymmetric erasure channel:

Joint distribution counterexample (Fig. 2 of the paper)

How can we calculate $\rho_m(X,Y)$ ? If either $X$ or $Y$ is binary-valued, then
$\rho_m(X,Y)^2 = -1 + \sum_{x,y} \frac{ p(x,y)^2 }{ p(x) p(y) }$
So for this example $\rho_m(X,Y)^2 = 0.6$ . However, $s^*( X,Y) = \frac{1}{2} \log_2(12/5) > 0.6$ and there exists a sequence of variables $U_i$ satisfying the Markov chain such that $U_i -- X -- Y$ such that the ratio approaches $s^*$ .

So where is the error in the original proof? Anantharam et al. point to an explanation that the Taylor series expansion used in the proof of the inequality with $\rho_m(X,Y)^2$ may not be valid at all points.

This seems to just be the start of a longer story, which I look forward to reading in the future!

	Zonghong Liu on A story about Canvas
	anonymousskimmer on “The needs of the many,…
	Chanterelle Recipes… on Broiled shrimp with chanterell…
	kvarsh on ICML 2019 encouraged code subm…
	Pulkit Grover on gender inclusivity in communic…

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

An Ergodic Walk

a process whose average over time converges to the true average

Daily Archives: April 27, 2013

HGR maximal correlation revisited : a corrected reverse inequality