Manuscript Central is annoying

The IEEE Transactions on Information Theory recently transitioned to using Manuscript Central from the old Pareja system, so now all of the IEEE journals for which I review seem to be managed by the same external management system. As a reviewer/author, I have a lot of complaints (small and large) about Manuscript Central:

  • Why oh why do I need to disable my popup blocker for your site to work?
  • Why can login information not be shared across different IEEE publications? I have a separate account for each journal, with a separate password. Thank goodness I have LastPass, but even that program gets confused sometimes.
  • What is the deal with the mandatory subject classifications for papers? One of the “topics” I could pick was “IEEE Transactions on Information Theory.” Really? That’s a topic?
  • Why must papers for review be emblazoned with that stupid pale blue “For Peer Review Only” running diagonally across each page? This causes PDF annotations such as highlighting to barf, making paperless reviewing of papers significantly more annoying than it needs to be.

The worst part is that I am sure IEEE could implement a significantly cheaper and just-as-effective system itself, but now each Society is forking over money to Manuscript Central, which as far as I can tell, offers significantly more annoyances for authors and reviewers and is a shoddy product. Perhaps as an editor it’s significantly better (I imagine it is), but it seems like a bad deal overall.

Of course, now I sound curmudgeonly. Get off my lawn!

Do other people like MC? Or do you have other pet peeves?

In The Family

On Saturday evening I saw In The Family at the Asian American Showcase. It’s a film by Patrick Wang, who I may have last seen in a production of Grand Hotel at MIT when I was just starting college. It’s a film that is definitely worth seeing — an affecting and truthful story, it may make you tear up at times. It will also make you believe that a deposition can be the most important moment in a person’s life.

The trailer for the movie is here:

The synopsis says

In the town of Martin, Tennessee, Chip Hines, a precocious six year old, has only known life with his two dads, Cody and Joey. And a good life it is. When Cody dies suddenly in a car accident, Joey and Chip struggle to find their footing again. Just as they begin to, Cody’s will reveals that he named his sister as Chip’s guardian. The years of Joey’s acceptance into the family unravel as Chip is taken away from him. In his now solitary home life, Joey searches for a solution. The law is not on his side, but friends are. Armed with their comfort and inspired by memories of Cody, Joey finds a path to peace with the family and closer to his son.

The trailer starts almost towards the end of the film, and I think doesn’t really show the things which are the most beautiful about it. There is a scene after Cody’s funeral when Joey and Chip return to the house, shocked. Joey sits at the kitchen table, and Chip (where do they get these child actors — the kid is amazing!) has a long silent scene in which he gets the mail, climbs on the step stool, gets a glass, gets the Coke from the fridge, pours himself some, gets his dad a beer, opens the beer with some effort, then clinks the bottle and glass for cheers, and that is what snaps Joey out of it and he start sorting the mail. This is what I mean by a truthful scene — in the face of trauma and loss, at some point we go on, as Beckett might say. Watching those moments is important.

So the film is 3 hours long almost. But it’s worth it, because it shows you that kind of truth. Moment by moment. You get to understand what is at stake in this story, why Cody and Chip mean so much to Joey. It’s a beautiful debut film, and was rejected from a number of festivals but they are self-distributing it and it’s going to appear soon in a venue near you, hopefully. Do try to see it — it will move you.

Cover’s test for the irrationality of a coin

Someone hands you a coin which has a probability p of coming up heads. You can flip the coin as many times as you like (or more precisely, you can flip the coin an infinite number of times). Let S = \{r_i : i = 1, 2, \ldots\} be the set of rational numbers in [0,1]. After each flip, you have to guess one of the following hypotheses: that p = r_i for a particular i, or p is irrational. Furthermore, you can only make a finite number of errors for any p \in [0,1] - N_0, where N_0 is a set of irrationals of Lebesgue measure 0. Can you do it? If so, how?

This is the topic addressed by a pair of papers that Avrim Blum mentioned in Yoav Freund‘s remembrances of Tom Cover:

COVER, THOMAS M (1973). On determining the irrationality of the mean of a random variable. Ann. Math. Statist. 1862-871.
COVER & HIRSCHLER (1975). A finite memory test of the irrationality of the parameter of a coin. Annals of Statistics, 939-946

I’ll talk about the first paper in this post.

The algorithm is not too complicated — you basically go in stages. For each time j = 1, 2, \ldots you have a function n(j). Think of n(j) as piecewise constant. There are two sequences: a threshold k_{n(j)}, and an interval width \delta_{n(j)}.

  1. Take the sample mean \hat{x}_{n(j)} and look at a interval of width 2 \delta_{n(j)} centered on it. Note that this makes the same decision for each j until n(j) changes.
  2. Given an enumeration of the set S, find the smallest i such that r_i \in [\hat{x} - \delta_{n(j)}, \hat{x} + \delta_{n(j)}].
  3. I there is an i < k_{n(j)} such that r_i \in [\hat{x} - \delta_{n(j)}, \hat{x} + \delta_{n(j)}] then declare p = r_i, otherwise declare p \notin S
  4. .

The last thing to do is pick all of these scalings. This is done in the paper (I won’t put it here), but the key thing to use is the law of the iterated logarithm (LIL), which I never really had a proper appreciation for prior to this. For \epsilon > 0,

| \hat{x}_n - p | \le (1 + \epsilon) (2 p (1 - p) \sqrt{ \frac{\log \log n}{n} })

for all but finitely many values of n. This gets used to set the interval width \delta_{n(j)}.

The cool thing to me about this paper is that it’s an example of “letting the hypothesis class grow with the data.” We’re trying to guess if the coin parameter p is rational and if so, which rational. But we can only apprehend a set of hypotheses commensurate with the data we have, so the threshold k_{n(j)} limits the “complexity” of the hypotheses we are willing to consider at time j. The LIL sets the threshold for us so that we don’t make too many errors.

There are lots of little extensions and discussions about the rationality of physical constants, testing for rationality by revealing digits one by one, and other fun ideas. It’s worth a skim for some of the readers of this blog, I’m sure. A miscellaneous last point : Blackwell suggested a Bayesian method for doing this (mentioned in the paper) using martingale arguments.

Linkage

Via Brandy, Kenji breaks down perfect hard boiled eggs. See also sauceome.

Bret Victor talks about Inventing on Principle — the first half are a lot of demos of some pretty amazing applications of his major driving principle, which is that creators should be able to see what they are creating in real time. He sometimes waxes a little TED-like, but overall, quite inspiring.

My high school history teacher, Chris Butler, has turned his award-winning lecture notes and flowcharts into an iPad app which is available on the App Store.

Queen, live at Wembley. (via MeFi)

Some pretty cool visualizations of sorting. (via logistic aggression)

Upper bounds for causal adversaries

Bikash Dey, Mike Langberg, Sid Jaggi, and I submitted an extended version of our (now accepted) ISIT paper on new upper bounds for binary channels with causal adversaries to the IT Transactions. The model is pretty straightforward : Alice (the encoder) transmits a message to Bob (the receiver) encoded over n uses of a binary input, binary output channel. The channel is controlled by Calvin (an adversary) who sequentially looks at each bit and can decide whether or not to flip it, up to pn total bit flips. That is, Calvin is causal : the decision to flip bit i is based on the knowledge of bits 1, 2, \ldots, i. What we show is a new upper bound on the capacity of this channel. Let \alpha(p,\bar{p}) = 1-4(p-\bar{p}). Then

C \le \min_{\bar{p} \in [0,p]} \left[ \alpha(p,\bar{p})\left(1-H\left(\frac{\bar{p}}{\alpha(p,\bar{p})}\right)\right) \right]

This is what it looks like:

New Upper Bound for Causal Adversaries

Plot of the new upper bound


So clearly causal adversaries are worse than i.i.d. noise (the 1 - H(p) bound).

To show such a bound we have to propose a new attack for the adversary. We call our attack “babble and push.” It operates in two phases. The first phase is of length \ell channel uses and the second of length n - \ell. Let \mathbf{x}(m) be the codeword for message m.

  1. (Babble) Calvin chooses a random subset \bar{p} n indices uniformly from all (\bar{p} n)-subsets of \{1, 2, \ldots, \ell\} and flips bit i for i \in \Gamma.
  2. (Push) Calvin finds all possible codewords which are consistent with what Bob has received in the first phase:

    B_{\mathbf{y}^{\ell}} = \{ u : d_H(\mathbf{y}^{\ell}, \mathbf{x}^{\ell}(u))=\bar{p}n \},

    and selects an element \hat{u} \in B_{\mathbf{y}_1} uniformly at random. For the second phase, Calvin selectively pushes the received codeword towards \mathbf{x}(\hat{u}) — if the transmitted codeword and the selected codeword match, he does nothing, and if they do not match he flips the bit with probability 1/2.

Analyzing this scheme amounts to showing that Calvin can render the channel “symmetric.” This is a common condition in arbitrarily varying channels (AVCs), a topic near and dear to my heart. Basically Bob can’t tell the difference between the real codeword and the transmitted codeword, because under Calvin’s attack, the chance that Alice chose u and Calvin chose \hat{u} is the same as the chance Alice chose \hat{u} and Calvin chose {u}. To establish this symmetry condition requires some technical excursions which are less fun to blog about, but were fun to figure out.

It’s relatively clear that this approach would extend to more general AVCs, which we could work on for the future. What is neat to me is that this shows how much value Calvin can derive by knowing the current input bit — by forcing additional uncertainty to Bob during the babble phase, Calvin can buy some time to more efficiently use his bit flipping budget in the second phase.

If at first you don’t succeed, normalize, normalize again

My ex-groupmate and fellow Uni High graduate Galen Reeves told me about a paper a few weeks ago when I visited him at Stanford:

Successive Normalization of Rectangular Arrays
Richard A. Olshen and Bala Rajaratnam
The Annals of Statistics 38(3), pp.1369-1664, 2010

Apparently, however, the arguments in the paper are not quite correct [1], and they recently uploaded a correction to ArXiV.

This paper looks at the effect of a very common preprocessing step used to transform an n \times k data matrix \mathbf{X} into a form acceptable for statistical or machine learning algorithms that assume things like zero-mean or bounded vectors. Here n may represent the number of individuals, and k the number of features, for example. Or the data may come from a DNA microarray (their motivating example). This preprocessing is often done without much theoretical justification — the mathematical equivalence of tossing spilled salt over your shoulder. This paper looks at the limiting process of standardizing rows and then columns and then rows and then columns again and again. They further need that n,k \ge 3. “Readers will see that the process and perhaps especially the mathematics that underlies it are not as simple as we had hoped they would be.”

So what exactly is the preprocessing? I am going to describe things in pseudocode (too lazy to do real code, sorry). Given a data matrix X[i,j] they look at

for i = 1:n { X[i,1:k] = X[i,1:k] - sum(X[i,1:k]) }
for j = 1:k { X[1:n,j] = X[l:n,j] - sum(X[1:n,j]) }

They call the first a “row mean polish” and the second a “column mean polish.” They show this converges in one step.

But what about standardizing? The more complicated polishing procedure looks like this:

for i = 1:n {
mu = sum(X[i,1:k])
sigma = sqrt( sum( (X[i,1:k] - mu)^2 ) )
X[i,1:k] = (X[i,1:k] - mu)/sigma
}
for j = 1:k {
mu = sum(X[1:n,j])
sigma = sqrt( sum( (X[1:n,j] - mu)^2 ) )
X[1:n,j] = (X[1:n,j] - mu)/sigma
}

This standardizes rows first, and then columns (or “lather” and “rinse,” since we are going to “repeat”). They call this Efron‘s algorithm because he told them about it. So what happens if we repeat these two steps over and over again on a matrix with i.i.d. entries from some continuous distribution?

Theorem 4.1 Efron’s algorithm converges almost surely for X on a Borel set of entries with complement a set of Lebesgue measure 0.

So what does it look like in practice? How fast is this convergence? Empirically, it looks exponential, and they have some theoretical guarantees in the paper, kind of hidden in the discussion. The proofs are not terribly ornate but are tricky, and I don’t quite get all the details myself, but I figured readers of this blog would certainly be interested in this cute result.

[1] A fun quote from the paper “Theorem 4.1 of [2] is false. A claimed backwards martingale is NOT. Fortunately, all that seems damaged by the mistake is pride. Much is true.” I really liked this.

Linkage

More attacks on anonymity in DNA databases.

A letter from Kurt Vonnegut to the head of a school board in North Dakota who burned copies of Slaughterhouse Five. (via MeFi)

An interview with Olympic bronze medalist John Carlos on giving a black power salute at the 1968 Olympics.

Tomorrow is National Grilled Cheese Day.

I really enjoyed this exhibit of Tagore’s painting at The Art Institute of Chicago, although my favorite drawing is not online, this bird was pretty cool.

The history of the martingale

The other day I found myself wondering “so what does the word martingale come from?” A short time on Google later, I came across this paper from Journal Electronique d’Histoire des Probabilités et de la Statistique, which had a special issue on The Splendors and Miseries of Martingales (Splendeurs et misères des martingales):

The Origins of the Word “Martingale”
Roger Mansuy
(earlier version : “Histoire de martingales” in Mathématiques & Sciences Humaines/Mathematical Social Sciences, 43th year, no. 169, 2005(1), pp. 105–113.)

It’s 10 pages and worth a read just for fun. Some of the fun facts:

  • Doob is the one who really made the name popular (in addition to proving many fundamental results). He got the name from a thesis by Ville.
  • A martingale is the name for a Y-shaped strap used in a harness — it runs along the horse’s chest and then splits up the middle to join the saddle.
  • A martingale is a name for a betting strategy (usually we think of doubling bets) but it’s not clear which one from the historical record.
  • “To play the martingale is to always bet all that was lost” (dictionary of the Acad ́emie Fran ̧caise, 4th ed.) — there are earlier dictionary definitions too, to 1750.
  • “A very slim trail seems to indicate a derivation of the word from the Provençal expression jouga a la martegalo, which means ‘to play in an absurd and incomprehensible way’.” Apparently Provençal is also the origin of Baccarat.
  • So what is martegalo? It might refer to a place called Martigues, whose residents are supposedly a bit naïve.
  • “Martingale pants” are from Martigues, and have, according to Rabelais, “a drawbridge on the ass that makes excretion easier.”
  • There’s a woman in the 17th century who called herself La Martingale and who made a number of prophetic predictions.
  • There were sailors called martégaux who gave their name to a rope called a martegalo used on sailboats. Perhaps this is where the horse connection comes in?
  • Apparently “martingale” is also vernacular for “prostitute,” but the etymology for that usage is not well-documented.

All in all, perhaps this essay ends up raising more questions than it answers, but I certainly had no idea that there was this much to be unearthed behind a simple word.

ICITS Workshop Deadline Extension

(Via Adam Smith) The deadline for submitting workshop papers to the 6th International Conference on Information Theoretic Security (ICITS) has been extended from today to Monday, April 23 (at 3pm Eastern) due to holidays. It’s in Montreal in August, so if you have some recent results that you would like to present in a workshop setting, please send them in. “Papers which broaden the applicability of information-theoretic techniques (say, to areas such as data privacy or anonymity) would also be very welcome.”

ICITS will have two tracks this year : a more CS-style conference with published proceedings, and a workshop (think ITA) track without proceedings where you can present older stuff.