paper a day : the Gale-Ryser theorem

By request, today’s paper is kind of about the mathematics of paint-by-numbers puzzles. Merry Christmas, Erin.

A Simple Proof of the Gale-Ryser Theorem
Manfred Krause
The American Mathematical Monthly, Vol. 103, No. 4. (Apr., 1996), pp. 335-337.

The Gale-Ryser Theorem is one of those beautiful little gems in combinatorics. The problem is rather simple to state. Suppose you have a k by n matrix A of zeros and ones. Let r be the vector of sums along each row and c be the vector of sums along each column. The question is this : given an r and a c, when does there exist a matrix A with those row and column sums?

First off, note that r and c satisfy

\| r \|_1 = \|c\|_1

since the total number of ones has to be constant. We need a few more concepts to understand the result. A partition of N is a set of positive integers r1 ≥ r2 ≥ … ≥ rk whose sum is N. It turns out that it’s sufficient to only consider r and c that are partitions (hint : start swapping rows and columns).

Given two partitions r and c of N, we say r is dominated by c if

\sum_{i=1}^{m} r_i \le \sum_{i=1}^{m} c_i \qquad \forall m

where the summands become 0 if we exhaust the partition. As it turns out, dominance provides a partial order on the set of partitions that is particularly useful in combinatorics. The whole theory of majorization is based around the dominance order.

The last concept we need is a Ferrers board and the conjugate partition. Given a partition r of N, imagine a matrix of 0’s and 1’s that has rj 1’s in row j. This is the Ferrers matrix of the partition. The conjugate partition of r, denoted by r*, is the partition whose Ferrers matrix is the transpose of the Ferrers matrix for r.

The Gale-Ryser theorem says that a matrix A exists with row and column sums r and c if and only if r is dominated by c*.

To prove that we need the dominance relation, suppose that r is not dominated by c*. Then look for gaps in A — that is, rows in which you get a 0 and then a 1. Induction on the number of gaps proves it. If there are no gaps we have a Ferrers matrix, so we’re done. If there is a gap, you can swap two entries in a column to reduce the number of gaps.

The novelty in the paper is a new proof of the “only if” half of the statement. It’s basically an algorithm that provably constructs a matrix satisfying the row and column sums given. Clearly this matrix is not unique in general. You start with the Ferrers matrix A0 for c and do a series of swaps until the row sums equal r. Let Aj be the matrix after j swaps. The algorithm guarantees the following:

\| r(A_j) - q \|_2 < \| r(A_{j-1}) - q \|_2

So it will eventually converge to a valid matrix.

The application to paint-by-numbers should be pretty clear. A PBN is just a 0-1 matrix of pixels. If the image in the PBN is convex, then the puzzle is kind of trivial, but you can use the theorem to decide if it's solvable. An interesting extension to work out would be a theorem or fast algorithm for deciding if a general PBN is solvable or not. If I had the time to think about this, I would. Maybe it's something to do on a rainy afternoon after my deadlines have passed.

am I a peer?

I’ve actually been invited to review a paper, which is a first for me. I guess all that stuff on peer review I’ve been reading during my procrastinating moments will actually come in use now. They’ve asked me to turn in my review in two months, which is good for me, since I have too many looming deadlines. I guess this is why publishing takes so long — reviewers take 2 months, editorial checking and decision, revisions, another round of reviewing, more revisions, and submission. It’s no wonder that in computer science they put all their energy into conferences — it keps the pace of research higher.

paper a day : marton’s broadcast region

This is pretty heavy on the information theory I guess…

A Coding Theorem for the Discrete Memoryless Broadcast Channel
Katalin Marton
IEEE Trans. Info. Theory, 25(3), 1979

This is a short paper that had a reputation (in my mind at least) for being hard to figure out. I think after my 3rd pass through it I kind of get it. The ingredients are: strongly typical sequences and auxiliary random variables that mimic the broadcast.

Suppose you want to send two different messages to two different people. Unfortunately, you can only broadcast, so that if you send Y then one person gets X via a communication channel F(X|Y) and the other gets Z via G(Z|Y). Given F and G, how should you design a Y so that the messages are decodable by their respective recipients, and so that the number of bits in the message is maximized? If we consider data rates, what are the maximum reliable rates of communication to the two users?

It turns out that the converse to this problem is not tight in many cases. That is, the set of data rates which are achievable is smaller than the set which may be achievable. This paper expands the region of achievable rates by a little bit more than had been known before 1979.

If Rx and Rz are the rates to the two users, then Marton shows that for random variables (U, V, W) which form Markov chains (U, V, W) – Y – X and (U, V, W) – Y – Z :

Rx ≤ I(W,U ; X)
Rx ≤ I(W,V ; Z)
Rx + Rz ≤ min{ I(W ; X), I(W ; Z) } + I(U ; X | W) + I(V ; X | W) – I(U ; V | W)

The main benefit of this theorem is that before U, V, and W had to be independent, and now they can be dependent.

The interesting part of this paper is of course the code construction (plus some applications at the end to the Blackwell channel that I won’t bother with here). The sketch is as follows — generate codewords wi iid with respect to W, then U codewords iid with P(u | wi), and V codewords iid with P(v | wi). Then you look at triples of (u, v, w) that are strongly typical, and for each one choose a random Y that is strongly typical conditioned on (u, v, w). Then the decoding is via strong typicality.

The details of the proof are interesting, but what I think it’s a great example of is how the pre-encoding can mimic broadcasting (W sends to U and V) and that this mimickry can be brought over to the channels at hand and is provably achievable. Marton writes that she thinks the novelty lies in the case when W is deterministic. In any case, it was an interesting read, and has given me a few more ideas to think about with respect to my own research.

shuffle and shake (finals edition)

Apparently I need to go back to the classics. Maybe the iPod is the hipster ouija board…

1. Cotton Tail (Duke Ellington)
2. Think (Aretha Franklin)
3. Circular Flexing (Squarepusher)
4. Black Bottom Stomp (Jelly Roll Morton)
5. Plenty More (Squirrel Nut Zippers)
6. You’ll see… when it comes (Samrat Chakrabarti/Soundtrack to Harvest)
7. Autumn in New York (Frank Sinatra)
8. Ich will meine Seele tauchen (Schumann/Wunderlich)
9. Dear Old Southland (Sidney Bechet)
10. Kanonen-Song [instrumental version] (Kurt Weill)
11. Légende (Wieniawski/Perlman)
12. Beatbox (Roni Size)
13. Temptation (Tom Waits)
14. Mon Cadavre est doux comme un gant (Poulenc/Stützmann)
15. Once In A Lifetime (Talking Heads)

A paper a day will return after I finish my MS thesis and catch up on my combinatorics.

flags

Does anyone know why a maximal chain in the lattice of subspaces of a vector space is called a flag? A chain of increasing subspaces doesn’t look anything like a flag!

something to not do while working on your thesis

Do not open two copies of the same chapter of your thesis, make 2 hours worth or revisions on it, and then save the old version over the changes before closing the file. Furthermore, do not fail to notice this problem until 6 hours after said changes were made. It will make you think that you have actually gone crazy and that the beautiful new draft with eloquent rephrasings was entirely in your head.