yet another test

I can only hope that this Gaussian will appear:

\mathbb{P}(X \le x) = \int_{-\infty}^{x} \frac{1}{\sqrt{2 \pi}} \exp(- y^2/2) dy

How to do it:

  1. Install the WP-Cache plugin. Note that you have to muck with file permissions and it’s somewhat non-intuitive.
  2. Install the MimeTeX plugin.
  3. Stress out about the fact that for some equations the terminating quote ” in the alt tag for the image has turned into a fancy quote. Come up with a hack involving placing an empty img tag right after every piece of LaTeX. Hope that you figure out a way around it.

not again!

Since I only recently started reading ArXiV through my RSS aggregator, I was a unaware of the astonishing regularity with which “proofs” of polynomial-time algorithms for NP-complete problems are proposed. Most recent is this one, but one can find a more comprehensive list here. The latter page is a bit too unskeptical of the claims, since they say “so and so proved P=NP in this paper.” It’s not a proof if it’s wrong, and pretty much all of these proofs have been shown to be wrong. But it might be an interesting exercise one week for some reading group or topics class to formally prove some of these guys wrong. Of course, for every person claiming a proof that P=NP there is another person ready to knock them down and claim that P!=NP. Maybe it’s just a little self-correcting mechanism in the ArXiV.

paper a day : an LP inequality for concentration of measure

A Linear Programming Inequality with Applications to Concentration of Measure
Leonid Kontorovich
ArXiV: math.FA/0610712

The name “linear programming” is a bit of a misnomer –it’s not that Kontorovich comes up with a linear program whose solution is a bound, but more that the inequality relates two different norms, and the calculation of one of them can be thought of as maximizing an inner product over a convex polytope, which can be solved as a linear program. This norm inequality is the central fact in the paper. There are two weighted norms on functions at work — one is defined via a recursion that looks a lot like the sum of martingale differences. The other is maximizing the inner product of the given function with functions in a polytope defined by the weights. All functions in the polytope have bounded Lipschitz coefficient. By upper bounding the latter norm with the former, he can apply the former to the Azuma/Hoeffding/McDiarmid inequalities that show measure concentration for functions with bounded Lipschitz coefficient.

On a more basic level, consider a function of many random variables. For example, the empirical mean is such a function. A bounded Lipschitz coefficient means they are “insensitive” to fluctuations in one value. This intuitively means that as we add more variables the function will not deviate from is mean value by very much. To show this, you need to bound a probability with the Lipschitz coefficient, which is exactly what this inequality is doing. Kontorovitch thereby obtains a generalization of the standard inequalities.

What I’m not sure about is how I would ever use this result, but that’s my problem, really. The nice thing about the paper is that it’s short, sweet, and to the point without being terse. That’s always a plus when you’re trying to learn something.

paper a day : periodic bloodsucking rates for vampires

Alex forwarded me this reference:

CYCLES OF FEAR: PERIODIC BLOODSUCKING RATES FOR VAMPIRES
R. F. Hartl, A. Mehlmann, and A. Novak
Journal of Optimization Theory and Applications, Vol.75 No. 3, 1992

Abstract:
In this paper, we present a new approach for modeling the dynamic intertemporal confrontation between vampires and humans. It is assumed that the change of the vampiristic consumption rate induces costs and that the vampire community also derives some utility from possessing humans and not only from consuming them. Using the Hopf bifurcation theorem it can be shown that cyclical bloodsucking strategies are optimal. These results are in accordance with empirical evidence.

Keywords: maximum principle; limit cycles; economics of human resources; vampire myth

To the feather-fool and lobcock, the pseudo-scientist and materialist, these deeper and obscurer things must, of course, appear a grandam’s tale.
Montague Summers. The Vampire in Europe

This paper analyzes a mathematical model of bloodsucking rates for vampires using control theory. However, as they note in the introduction,

To a traditional vampirologist, the use of optimal control theory against vampires, as exercised in Ref. 6, seems highly questionable. This is due to the fact that the application of Pontryagin’s principle requires the derivation of a shadow price for vampires. Such a price is, however, nonexistent since vampires do not have a shadow.

As a predator-prey scenario, we can model the dynamics of the population using some differential equations. The problem for the vampires is to set a bloodsucking rate (humans per vampire) so as to maximize a utility function subject to the dynamics. However, the model has to be made more sophisticated to account for the cyclical bloodsucking patterns found in real vampires. The modifications are twofold — firstly, vampires also derive some utility from posessing humans rather than just sucking blood from them, and secondly, changing the consumption rate penalizes the utility. So in this push-and-pull framework they can derive some cycles, appropriately named “cycles of fear” in which the bloodsucking rate is modulated over time to achieve a stable tradeoff and net utility.

The full version, which is not to be missed, can be found via SpringerLink.
For some earlier comments on the optimal destruction of vampires and macroeconomic policy (which involves the shadow price), see this related JSTOR article.

My research is waaaaaay too boring.

paper a day : importance sampling in rare-event simulations

Introduction to importance sampling in rare-event simulations
Mark Denny
European Journal of Physics, 22 (2001) : 403–411

This paper is about importance sampling (IS), which a method to improve the error behavior of Monte Carlo (MC) methods. In engineering systems, getting good simulation results for rare events (such as decoding error) on the order of 10-10 would require an obscene amount of computation if you just did things the naive way. For example, the quality of a numerical bound on the tail probability of a random variable gets worse and worse as you look farther and farther out. Importance sampling is a method of reweighting the distribution to either get a smaller error in the regime of interest and/or uniformize the estimation error. This paper gives some motivation, a simple IS algorithm, analysis, and some simulations. It’s pretty readable, and I went from knowing nothing about importance sampling to getting a decent idea of how to use it in practice, along with its potential problems and benefits.

a quote that gives some comfort

From Van H. Vu’s “Concentration of Non-Lipschitz Functions and Applications,” Random Structures and Algorithms 20 : 262–316, 2002 :

Finding this hypothesis was actually the hardest part of our work, requiring some insight and a numerous number of attempts. As the reader will see after reading Section 5, a proper induction hypothesis not only allows us to derive a powerful result, but also reduces the proof to a rather routine verification of a few conditions.

I have spent a long time reading through difficult proofs with numerous lemmata, wondering why it had to be so complicated. Some things have to be proved by brute force. For others, just phrasing the problem in the right way can make the proof seem trivial. Some might say “well, I could have done that,” but the more accurate response is “I wish I had thought to do it that way.”

paper a day (month?) : isomap

A Global Geometric Framework for Nonlinear Dimensionality Reduction
J. B. Tenenbaum, V. de Silva, J. C. Langford
Science, v290, p2319 – 2323, 22 December 2000.

I have to present this paper for a computational neuroscience reading group that I can (finally) attend. The basic problem is something called manifold learning. Imagine you have a very large data set in a huge number of dimensions — for example, 64 x 64 pixel images of faces, which live in a 4096-dimensional space. Furthermore, suppose that all of the pictures are of the same person, with only two parameters changing — the angle of rotation of the face, and the illumination. The data has only two degrees of freedom, so you would think it would live on a 2-dimensional subspace.

Unfortunately, the data you have doesn’t occupy a linear subspace in the observed variables. Instead, they live on a manifold, which is like a surface in your high dimensional space that may be strangely curved or twisted, and so may be very poorly approximated by a linear subspace. However, the manifold does have its own coordinate system, and you can
calculate distances between points on the manifold. The shortest path between two points is called a geodesic.

Another way to visualize this is a ribbon that is curled into a spiral. A point A on the ribbon might look close to a point B that is on an outer ring, but if you unwrapped the ribbon they would be far apart. So one thing you might think to do is somehow figure out what the real distance (on the surface of the ribbon) between A and B is. You might be able to do that by somehow hopping from data-point to data-point in short hops that would hopefully follow the contour of the ribbon.

That is exactly what the Isomap algorithm, described in this paper, does to perform its dimensionality reduction. Given a set of points {xk : k = 1, 2,…, K} in n-dimensional space X, they first make a graph G with vertex set {xk} by putting an edge between two points if the distance between them in X is less than some threshold. The edge has a weight given by the distance. Then they find a K x K matrix D whose (i,j)-th entry is the minimum-weight (distance) path in the graph G between xi and xj. Assuming the threshold is not too large and there are a lot of data points, these lengths should closely approximate the true (geodesic) distance on the manifold.

Armed with this matrix of distances between points, we can try to embed the manifold into a d-dimensional Euclinean space without distorting the distances too much. This is an easy problem to visualize — just imagine taking a globe, fixing a few cities, and trying to make a flat map in which the distances between those cities are preserved. There is an algorithm called multidimensional scaling (MDS) that can do this. You can trade off the embedding distortion with the number of dimensions.

This paper comes with its own homepage, which has some data sets and MATLAB code. If only all practitioners were so generous — too often the algorithm implementation is kept under wraps, which makes me wonder if there are some dirty secrets hiding behind the pretty plots.

One thing about reading this paper that annoyed me is that all of the technical details (which I care about) are hidden in tiny-print footnotes. Furthermore, all the citations do not include the paper titles, so you can’t tell cited papers are actually about. I know that page space is precious, but it’s just plain stupid. Shame on you, Science. I expected better.

As as amusing postscript, the commentary on this paper and the locally linear embedding paper (Roweis and Saul) written by Seung and Lee has pictures of Bush and Gore in the print edition but due to copyright issues the online version had to be changed.

three quotes from James Munkres

I found these on a post-it note from his book Topology. I took the class in spring 2000, which feels like ages ago. He ended up writing me a recommendation letter for grad school, so I suppose I have him to thank for where I am now.

  • [2/22/00] (on the topological definition of continuity) “All this exists in the never-never land of abstract set theory.”
  • [3/17/00] “History’s dead.”
  • [4/7/00] “If somebody said I was completely normal, I’d hit them.”

I have more quotes somewhere in my notes for the class, but I have no idea where those are. I have tons of juicy quotes in my probability notes, but those have been AWOL for years, more’s the pity.

Simpsons and math

I was completely bogged down with work so I missed some friends’ Simpsons Math Party in honor of last night’s episode, in which Principal Skinner is removed from his position after a Larry Summers-like incident in which he said girls are bad at math. His replacement divides the school along gender lines — the girls’ half is beautifully landscaped with famous women artists’ works on the walls, and so on. Unfortunately for Lisa, math class is now about how numbers “make you feel.” “Is the number 7 odd, or just different?” asks the teacher, before leading the class in a self-affirmation conga line.

Lisa decides to sneak over to the boys side, which, in typical Simpsons fashion, has been turned into a post-apocalyptic warzone. The boys are savages, running around playing “guns” and drawing robots made of guns blowing up things with guns that shoot guns. The one saving grace for Lisa is that they study real math there. With Marge’s help she dresses as a boy and goes to math class where she (gasp!) actually learns that +5 and -5 are solutions to X2 = 25. When Lisa gets into a playground fight Bart discovers her secret and helps her become more like a boy. Lisa gains acceptance by beating up Ralph and “becoming all that she hates.”

In the end, Lisa wins the prize for best math student and unmasks herself, saying that it proves girls are good at math. Bart responds with the point that it is because she “became a boy” that she could learn the math. As Lisa tries to wend her way through these opposing points the auditorium descends into chaos. As the credits roll we’re treated to Jethro Tull’s masterpiece, Thick As A Brick.

For me, the credits were probably the best part of this episode. While I have to give credit to the show for taking a difficult issue and trying their best to satirize it, it misses the point, I think. To the writers, the public discussion of the Summers case focused too much on him and not on what the underlying problem issue, which is that of mathematics education. Correspondingly, Skinner is disposed of in a matter of minutes (his pandering nature could be another subject of discussion). The new principal, a hard-talking cariacature of a feminist, decries inequity but is uninterested in real reform. The episode kind of moves from there on out in the logic of The Simpsons world. In a way, it all justifies Skinner/Summers, since the most vocal critics are uninterested in the real issue at hand (math).

The issue of gender and learning and whether there are “gendered” school subjects is just brought up at the end of the episode and never addressed! Perhaps to the writers it lampoons itself by its absurdity and we’re supposed to laugh Bart out of the debate, but Lisa takes him seriously and so do a huge number of people. Indeed, the gender-labeling of academic disciplines is probably one of the more harmful effects of our current public education system. When Bart claims that you have to be boy-like to learn math, is this a pointer to the debate we should be having, or just a garnish for the episode? I would argue that the chaos of the “chair-fight” at the end points gives the latter effect — the last thing the show wants is to seriously moralize. But to leave it so ambiguous is a bit dangerous, I think.

What the episode does do is bring up a whole barrel of ideas to play around with and fodder for discussion, so it wasn’t a total wash. Plus, how often do you hear Thick As A Brick on TV?

the value of algebra

One of the things that unsettles me the most is when people revel in their own ignorance. They go around proudly proclaiming that they never bothered to learn A and they turned out ok and happy, so clearly A is not important to know. It’s a roundabout way of arguing that it’s ok to be bad at A because secretly A isn’t worth it. Of course, since most of the public discussion about this happens in the media, it’s invariably mathematics and science that bear the brunt of it. The latest “contribution” to the fire is Richard Cohen’s article on algebra.

Cohen isn’t good at math. He can handle “basic arithmetic all right (although not percentages),” but barely made it through algebra and geometry and then turned his back on mathematics forever. He boldly asserts that math should be on a need-to-know basis because “most of math can now be done by a computer or a calculator.” The modern computer age has made many mechanical jobs obsolete, and with them, many skills, it seems. Being able to do percentages seems pretty important to me, given our tax system, but I suppose we have computers to do our taxes for us as well — why bother knowing how to check if its correct?

It’s not really the uselessness of math that Cohen is interested it — he wants to establish a pecking order among disciplines, at the top of which is writing. Because to him, computers are math, and computers cannot “write a column or even a thank-you note — or reason even a little bit” (leave that aside for a moment, you AI-fiends), math is inferior to writing. Someone should send him back to a rhetoric class and make him reread his Aristotle.

Cohen’s final surge off the tracks of reason is a lovely piece of anecdotal evidence:

all the people in my high school who were whizzes at math but did not know a thing about history and could not write a readable English sentence. I can cite Shelly, whose last name will not be mentioned, who aced algebra but when called to the board in geography class, located the Sahara Desert right where the Gobi usually is. She was off by a whole continent.

Perhaps a remedial logic class is in order as well, although I suppose philosophy is equally wasted on Cohen. The lovely canard of the hapless math geek who can’t manage to understand any other subject is cheap and tawdry. Is this the best that he can do to muster an argument?

Cohen privileges his fear of mathematics, implicitly claiming that this fear is unique to the subject

There are those of us who know the sweat, the panic, the trembling, cold fear that comes from the teacher casting an eye in your direction and calling you to the blackboard. It is like being summoned to your own execution.

Oh poor poor Richard Cohen. I shed a tear for you. Mathematics emasculated you and now you will have your revenge. You’ll get those nerds back. Gobi Desert! Ha ha ha!

It’s true that some people are not good at math. This doesn’t mean that they couldn’t be good at math, but for whatever reason either through their own lack of interest or a poor background from grade school on up, it doesn’t click for them. And maybe there should be a debate as to whether algebra should be required for graduation. This column is not debate — it’s a shoddily assembled collection of logical fallacies and cheap shots. It’s true that a computer would never be able to write this column. But who would want it to?

A middle-school student I know told me recently that the two subjects that you really don’t need to know are science and history. I asked her why and she said “because you don’t need them for your life.” I tried to argue with her that yes, you don’t need them to live, but imagine how much richer a life you will live because you know them. History and science give you the why of things; they let you understand how the world works, how to tell when someone is feeding you a line about politics or the big bang. If a little knowledge is a dangerous thing, then no knowledge is safety! Big Brother would be proud of you, Richard Cohen. Have you read any Orwell?

Update : via Kevin Drum.