From one of the presentation of Zhao and Chia at Allerton this year, I was made aware of a paper by Elza Erkip and Tom Cover on “The efficiency of investment information” that uses one of my favorite quantities, the Hirschfeld–Gebelein–Rényi maximal correlation; I first discovered it in this gem of a paper by Witsenhausen.
The Hirschfeld–Gebelein–Rényi maximal correlation between two random variables
and
is
where is all real-valued functions such that
and
and
is all real valued functions such that
and
. It’s a cool measure of dependence that covers discrete and continuous variables, since they all get passed through these “normalizing”
and
functions.
The fact in the Erkip-Cover paper is this one:
.
That is, the square of the HGR maximal correlation is the best (or worst, depending on your perspective) ratio of the two sides in the Data Processing Inequality:
.
It’s a bit surprising to me that this fact is not as well known. Perhaps it’s because the “data processing” is happening at the front end here (by choosing ) and not the actual data processing
which is given to you.
This is a really cute result!
As for the direction of the data processing: if Z -> Y -> X is a Markov chain, then so is X -> Y -> Z, so you can think about X as the object of inference, Y as the observation, and P(z|y) as a randomized processor.
Having thought about this more carefully, I see your point: you would like to bound the ratio of I(X; Z) to I(X; Y) instead …
Yeah, it’s just not quite what you want. But still interesting!
You might have some interest in a result by Evans and Schulman: http://dx.doi.org/10.1109/18.796377. This was developed in the setting of noisy computation, but may be useful more broadly.
Pingback: ITA Workshop 2013 : post the first « An Ergodic Walk
Pingback: HGR maximal correlation revisited : a corrected reverse inequality | An Ergodic Walk