paper a day : marton’s broadcast region

This is pretty heavy on the information theory I guess…

A Coding Theorem for the Discrete Memoryless Broadcast Channel
Katalin Marton
IEEE Trans. Info. Theory, 25(3), 1979

This is a short paper that had a reputation (in my mind at least) for being hard to figure out. I think after my 3rd pass through it I kind of get it. The ingredients are: strongly typical sequences and auxiliary random variables that mimic the broadcast.

Suppose you want to send two different messages to two different people. Unfortunately, you can only broadcast, so that if you send Y then one person gets X via a communication channel F(X|Y) and the other gets Z via G(Z|Y). Given F and G, how should you design a Y so that the messages are decodable by their respective recipients, and so that the number of bits in the message is maximized? If we consider data rates, what are the maximum reliable rates of communication to the two users?

It turns out that the converse to this problem is not tight in many cases. That is, the set of data rates which are achievable is smaller than the set which may be achievable. This paper expands the region of achievable rates by a little bit more than had been known before 1979.

If Rx and Rz are the rates to the two users, then Marton shows that for random variables (U, V, W) which form Markov chains (U, V, W) – Y – X and (U, V, W) – Y – Z :

Rx ≤ I(W,U ; X)
Rx ≤ I(W,V ; Z)
Rx + Rz ≤ min{ I(W ; X), I(W ; Z) } + I(U ; X | W) + I(V ; X | W) – I(U ; V | W)

The main benefit of this theorem is that before U, V, and W had to be independent, and now they can be dependent.

The interesting part of this paper is of course the code construction (plus some applications at the end to the Blackwell channel that I won’t bother with here). The sketch is as follows — generate codewords wi iid with respect to W, then U codewords iid with P(u | wi), and V codewords iid with P(v | wi). Then you look at triples of (u, v, w) that are strongly typical, and for each one choose a random Y that is strongly typical conditioned on (u, v, w). Then the decoding is via strong typicality.

The details of the proof are interesting, but what I think it’s a great example of is how the pre-encoding can mimic broadcasting (W sends to U and V) and that this mimickry can be brought over to the channels at hand and is provably achievable. Marton writes that she thinks the novelty lies in the case when W is deterministic. In any case, it was an interesting read, and has given me a few more ideas to think about with respect to my own research.