# McGill’s policy on “harmful consequences”

McGill University is contemplating ending “a requirement that any professor receiving research support from the military indicate whether the research could have ‘direct harmful consequences.'” The proponents of striking the measure say that all research should be scrutinized for harmful consequences, whereas the opponents say that it opens the gates for the US defense industry to shift the Canadian (Canadien?) research agenda.

I’m surprised they even had such a provision in the first place, given the existing injunctions against secret/classified research.

This reminds me a discussion last night at dinner, where my friend told us about a book by UCSD professor Chandra Mukerji called A Fragile Power: Scientists and the State, which talks a bit about science is dependent on state (and military funding) and how the state views scientists as a kind of “reserve force” of experts whose knowledge may become crucial later.

# From Polya urns to the OK Corral

A new postdoc here, Punyaslok Purkayastha, pointed out to me a Probability Surveys paper by Robin Pemantle on random processes with reinforcement, and I’ve been reading through it in spare moments (usually on the bus). I have had a problem knocking about my head that is related (Ram Rajagopal and I thought we could do in our spare time but we never managed to find enough time to work on it). All this stuff is pretty well-known already, but I thought it was a nice story.

A Polya urn process starts by taking an urn with $R_0 = a$ red balls and $B_0 = b$ black balls at time 0. For each time $t \ge 1$, draw a ball uniformly from the balls in the urn and then replace it together with another ball of the same color. Let $X_n$ be the sequence of colors that you draw, where red is 0 and black is 1. So with probability $R_n / (R_n + B_n)$ we have $X_n = 0$. The urn is updated according to $R_{n+1} = R_n + 1(X_n = 0)$ and $B_{n+1} = B_n + 1(X_n = 1)$.`

To figure out the asymptotic behavior of this process, just note that the sequence of colors is exchangeable. If $x_1^n$ has $k$ 1’s in it, then $\mathbb{P}(X_0^{n} = x_0^n) = \frac{ \prod_{i = 0}^{k-1} (a + i) \prod_{i=0}^{n-k-1} (b + i)}{ \prod_{i=0}^{n-1} (a + b + i) }$

From this it is clear that the probability of seeing $x_1^n$ only depends on its type (empirical distribution, for non-information theorists). Since the sequence is exchangeable, de Finetti’s Theorem shows that the fraction of red balls $W_n = R_n/(R_n + B_n)$ converges almost surely to a random variable $W$, and the limit of the formula above shows that the limit is a beta distribution with parameters $(a,b)$: $f_W(w) = \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} w^{a-1} (1 - w)^{b-1}$

A more general process is the Friedman urn process, in which you add $\alpha$ balls of the same color you drew and $\beta$ balls of the opposite color. If $\alpha > \beta > 0$ then Freedman (different spelling, different guy) showed that the proportion of red balls tends almost surely to 1/2. This happens because the evolution ends up looking like a symmetric two-state Markov chain, which has (1/2, 1/2) as its stationary distribution. Note here that the limiting fraction is almost surely constant, as opposed to there being a limiting distribution, as in the Polya process.

Let’s look at a special case where instead of adding a ball of the same color that you drew, you only add a ball of the opposite color (which is a Friedman process with parameters (0,1)). Freedman’s result doesn’t apply here, but Kingman has a nice paper (which I am reading now) relating this process to a third process: the OK Corral process.

In the OK Corral process, say we start with $(c,d)$ good and bad cowboys. We choose one cowboy uniformly at random and then he kills a cowboy on the opposite side. This process continues until one side has all been shot. This process is precisely the time reversal of the special Friedman urn process above. If we let $S_N$ denote the number of surviving cowboys after the process terminates when we start with $(N,N)$, Kingman proves that $V_N = \frac{3S_N^4}{2N^3}$

converges to a gamma distribution with parameter 1/2: $f_V(v) = v^{-1/2} e^{-v}/\Gamma(1/2)$

The weird thing about this is that the scaling is $N^{3/4}$ to get an asymptotic distribution.

Kingman and Volkov extend the analysis by embedding the urns into birth processes, which is another technique for balls-into-bins with feedback of urns with reinforcement. But that’s another story for another time (but maybe not another blog post, this one took a bit longer than I expected).

# ITA Workshop Aftermath

The 2010 ITA Workshop is over, and now that I have almost caught up on sleep, I can give a short report. I think this year was the largest workshop so far, with over 300 speakers.

One of the additions this year was a poster session to give those students who didn’t get a chance to speak at the “Graduation Day” talks on Wednesday an opportunity to present their research. The posters went up on Monday and many stayed up most of the week. I am usually dubious of poster sessions for more theoretical topics; from past experience I have had a hard time conveying anything useful in the poster and they are poorly attended. However, this poster session seemed to be a rousing success, and I hope they continue to do it in future years.

The theme this year was “networks,” broadly construed, and although I didn’t end up going to a lot of the “pure” networking sessions, the variety of topics was nice to see, from learning in networks to network models. Jon Kleinberg gave a great plenary lecture on cascading phenomena. I particularly enjoyed Mor Harchol-Balter’s tutorial on job scheduling in server farms. I learned to re-adjust my intuition for what good scheduling policies might be. The tutorials should be online sometime and I’ll post links when that happens.

The “Senseless Decompression” session organized by Rudi Urbanke and Ubli Mitra should also be online sometime. Apparently 20+ schools contributed short videos on the theme of $\frac{1}{2} \log(1 + \mathrm{SNR})$. Maybe we can make a YouTube channel for them.

Perhaps later I’ll touch on specific talks that I went to, but this time around I didn’t take too many notes, probably because I was a little exhausted from the organizing. I may post a more technical thing on the talk I gave about my recent work on privacy-preserving machine learning, but that will have to wait a bit in the queue. Mor’s tutorial suggests I should use Shortest Remaining Processing Time (SRPT) to make sure things don’t wait too long, and I have some low-hanging tasks that I can dispatch first.