HGR maximal correlation and the ratio of mutual informations

From one of the presentation of Zhao and Chia at Allerton this year, I was made aware of a paper by Elza Erkip and Tom Cover on “The efficiency of investment information” that uses one of my favorite quantities, the Hirschfeld–Gebelein–Rényi maximal correlation; I first discovered it in this gem of a paper by Witsenhausen.

The Hirschfeld–Gebelein–Rényi maximal correlation $\rho_m(X,Y)$ between two random variables $X$ and $Y$ is

$\sup_{f \in \mathcal{F}_X, g \in \mathcal{G}_Y} \mathbb{E}[ f(X) g(Y) ]$

where $\mathcal{F}_X$ is all real-valued functions such that $\mathbb{E}[ f(X) ] = 0$ and $\mathbb{E}[ f(X)^2 ] = 1$ and $\mathcal{G}_Y$ is all real valued functions such that $\mathbb{E}[ g(Y) ] = 0$ and $\mathbb{E}[ g(Y)^2 ] = 1$. It’s a cool measure of dependence that covers discrete and continuous variables, since they all get passed through these “normalizing” $f$ and $g$ functions.

The fact in the Erkip-Cover paper is this one:

$sup_{ P(z|y) : Z \to Y \to X } \frac{I(Z ; X)}{I(Z ; Y)} = \rho_m(X,Y)^2$.

That is, the square of the HGR maximal correlation is the best (or worst, depending on your perspective) ratio of the two sides in the Data Processing Inequality:

$I(Z ; Y) \ge I(Z ; X)$.

It’s a bit surprising to me that this fact is not as well known. Perhaps it’s because the “data processing” is happening at the front end here (by choosing $P(z|y)$) and not the actual data processing $Y \to X$ which is given to you.

6 thoughts on “HGR maximal correlation and the ratio of mutual informations”

1. This is a really cute result!

As for the direction of the data processing: if Z -> Y -> X is a Markov chain, then so is X -> Y -> Z, so you can think about X as the object of inference, Y as the observation, and P(z|y) as a randomized processor.

• Having thought about this more carefully, I see your point: you would like to bound the ratio of I(X; Z) to I(X; Y) instead …

• Yeah, it’s just not quite what you want. But still interesting!

2. lvarsh says:

You might have some interest in a result by Evans and Schulman: http://dx.doi.org/10.1109/18.796377. This was developed in the setting of noisy computation, but may be useful more broadly.

This site uses Akismet to reduce spam. Learn how your comment data is processed.