HGR maximal correlation and the ratio of mutual informations

From one of the presentation of Zhao and Chia at Allerton this year, I was made aware of a paper by Elza Erkip and Tom Cover on “The efficiency of investment information” that uses one of my favorite quantities, the Hirschfeld–Gebelein–Rényi maximal correlation; I first discovered it in this gem of a paper by Witsenhausen.

The Hirschfeld–Gebelein–Rényi maximal correlation $\rho_m(X,Y)$ between two random variables $X$ and $Y$ is

$\sup_{f \in \mathcal{F}_X, g \in \mathcal{G}_Y} \mathbb{E}[ f(X) g(Y) ]$

where $\mathcal{F}_X$ is all real-valued functions such that $\mathbb{E}[ f(X) ] = 0$ and $\mathbb{E}[ f(X)^2 ] = 1$ and $\mathcal{G}_Y$ is all real valued functions such that $\mathbb{E}[ g(Y) ] = 0$ and $\mathbb{E}[ g(Y)^2 ] = 1$ . It’s a cool measure of dependence that covers discrete and continuous variables, since they all get passed through these “normalizing” $f$ and $g$ functions.

The fact in the Erkip-Cover paper is this one:

$sup_{ P(z|y) : Z \to Y \to X } \frac{I(Z ; X)}{I(Z ; Y)} = \rho_m(X,Y)^2$ .

That is, the square of the HGR maximal correlation is the best (or worst, depending on your perspective) ratio of the two sides in the Data Processing Inequality:

$I(Z ; Y) \ge I(Z ; X)$ .

It’s a bit surprising to me that this fact is not as well known. Perhaps it’s because the “data processing” is happening at the front end here (by choosing $P(z|y)$ ) and not the actual data processing $Y \to X$ which is given to you.

6 thoughts on “HGR maximal correlation and the ratio of mutual informations”

mraginsky says:

on November 2, 2011 at 11:27 am

This is a really cute result!

As for the direction of the data processing: if Z -> Y -> X is a Markov chain, then so is X -> Y -> Z, so you can think about X as the object of inference, Y as the observation, and P(z|y) as a randomized processor.

- mraginsky says:
  
  on November 2, 2011 at 12:11 pm
  
  Having thought about this more carefully, I see your point: you would like to bound the ratio of I(X; Z) to I(X; Y) instead …
  
  - Anand Sarwate says:
    
    on November 2, 2011 at 12:51 pm
    
    Yeah, it’s just not quite what you want. But still interesting!
lvarsh says:

on November 14, 2011 at 8:50 am

You might have some interest in a result by Evans and Schulman: http://dx.doi.org/10.1109/18.796377. This was developed in the setting of noisy computation, but may be useful more broadly.

Pingback: ITA Workshop 2013 : post the first « An Ergodic Walk
Pingback: HGR maximal correlation revisited : a corrected reverse inequality | An Ergodic Walk

	Zonghong Liu on A story about Canvas
	anonymousskimmer on “The needs of the many,…
	Chanterelle Recipes… on Broiled shrimp with chanterell…
	kvarsh on ICML 2019 encouraged code subm…
	Pulkit Grover on gender inclusivity in communic…

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

An Ergodic Walk

a process whose average over time converges to the true average

HGR maximal correlation and the ratio of mutual informations

6 thoughts on “HGR maximal correlation and the ratio of mutual informations”

Leave a comment Cancel reply

Share this:

Related

6 thoughts on “HGR maximal correlation and the ratio of mutual informations”

Leave a comment Cancel reply