Chernoff bounds and Bernstein’s trick

I came across some references to “Bernstein’s trick” in some papers I was reading, but had to do a little digging to find out what it really meant. Suppose you have independent and identically distributed random variables Xi for i = 1, 2, … n, taking values in [-1, 1], and with E[Xi] = 0. Our goal is to bound

Pr[ (1/n) ∑ Xi > a ] .

The Bernstein trick is to multiply both sides by a dummy variable t and exponentiate both sides:

Pr[ (1/n) ∑ Xi > a ] = Pr[ exp(t ∑ Xi) > exp(a n t) ] ,

Now we use the Markov inequality on the random variable exp(t ∑ X_i):

Pr[ exp(t ∑ Xi) > exp(a n t) ] ≤ exp(- a n t) E[ exp(t ∑ Xi) ],

Now we can leverage the fact that the X_i‘s are independent:

exp(- a n t) E[ exp(t ∑ Xi) ] = exp(- a n t) E[ exp(tX) ]n,

where X has the same distribution as Xi. Then we can expand exp(tX) as a Taylor series:

exp(- a n t) E[ exp(tX) ]n = exp(- a n t) E[ 1 + t X + (1/2!) (t X)2 + … ] .

We now leverage the fact that X is in the interval [-1,1] and that E[X] = 0 to bound the sum of the higher-order terms in the Taylor series:

exp(- a n t) E[ 1 + t X + (1/2!) (t X)2 + … ] ≤ exp(- a n t) exp(b k t2) .

For some constant b, which you can pick based on how much you know about the distribution of X.

Now the more probability-savvy may say “wait a minute, E[ exp(tX) ] is just the moment generating function mX(t), so isn’t this just a fancy Chernoff bound? The answer is basically yes, but the term “Chernoff bound” is used for a lot of related bounds. The one I’m most familiar with requires you to know the moment generating function E[ exp(tX) ]:

P( exp(t X) > exp(a t) ) ≤ exp(-a t) mX(t) .

Since this holds for all t, you differentiate with respect to t and find the minimum to get the tightest bound.

The main difference in these two Chernoff-style bounds is the lack of knowledge needed to apply the former to sums of iid random variables. Of course, you can get all of these bounds from large deviations theory or concentration inequalities, but sometimes a rough trick is all you need to get a sufficiently fast exponential decay so that you can move on to the next step.

Note : updated with nicer characters, thanks to Erin.

[1] Ahlswede, R. “Elimination of Correlation in Random Codes for Arbitrarily Varying Channels,” Zeitschr. Wahr. und Verw. Geb. V.44, 1978, pp. 159–175.
[2] Wigderson, A. and Xiao, D. “A Randomness-Efficient Sampler for Matrix-valued Functions and Applications.” FOCS 2005.
[3] Gallager, R. Discrete Stochastic Processes, Boston : Kluwer, 1996.



In going through Google hits for my name I came across this reference to the ExploraVision contest that I did in high school. It seems so long ago, those trips to the Viscount Den for huge Pepsi’s, pulling all-nighters to get AfterEffects to correctly composite our video, having play performances and going immediately afterwards to NCSA to work… Looking back on it, we were completely insane. But I think that experience helped break previous attitudes I had about how days and work should be structured, and how much you can really do in one day if you put your mind to it.

It is pretty cool that we’re in the Congressional record though…

congratulations, Mr. Pinter

Harold Pinter won the Nobel Prize for Literature this year. It coincides nicely with a BBC radio programme of his more recent plays, including the harrowing Mountain Language.

Pinter presents a wonderful challenge to actors and directors alike. How can we make this play fresh and surprising? The biggest mistake is to think that because the text is so spare that it is somehow more malleable than other plays. What’s great about directing or acting in a Pinter scene is that when you’re doing it wrong, you can tell. It becomes boring, confusing, or too obvious very quickly. It makes you pay attention to details.

It’s interesting to see how Pinter dramatizes meanace and state oppression in an overt yet disguised way, versus Caryl Churchill’s more oblique approach in her (somewhat) recent play Far Away.


Today I ended up looking up the etymology of the word “galley” to refer to page proofs in publishing. I’m reminded of why I love the OED:

5. Printing. a. [F. galée.] An oblong tray of brass, wood, or zinc, to which the type is transferred from the composing-stick.

1652 URQUHART Jewel Wks. (1834) 182 His [the setter’s] plenishing of the gally, and imposing of the form. 1683 MOXON Mech. Exerc. II. 25 Our Master Printer is also to provide Galleys of different sizes. 1777 HOOLE Comenius’ Vis. World (ed. 12) 118 He putteth these in a gally till a page be made. 1864 Daily Tel. 28 June, Three or four compositors..bring up their various contribution of type to the long ‘galley’ in which the article is put together.

b. A galley-proof; = SLIP n.2 10d.

1890 in WEBSTER. 1934 T. R. COWARD in G. Gross Publishers on Publishing (1961) 149 When the corrections are made, the galleys go back to the printer and are made into page proofs. 1951 S. JENNETT Making of Books I. vi. 88 The page proofs come to the reader, and must be checked against the corrected galleys, to see that all the corrections have been carried out. 1971 Times Lit. Suppl. 20 Aug. 999/1, I have had galleys from Penguin Books, but more usually the finished product, fresh misprints and all.

Unfortunately, I don’t know where the word galée comes from, or if it has any relation to the maritime use.


I did the Bay Area Night Game 12 tonight with the Platonic Solids (I was the dodecahedron). It was fun up until I realized that I had lost my cellphone, bike headlight, and only functional pencil. So that’s $100+ I’m out now, and we didn’t even win…

But I suppose we should get around to writing the BANG we were supposed to have written earlier. Maybe if I get the time I’ll write a critique of the puzzles.