paper a day : last encounter routing with EASE

Locating Mobile Nodes with EASE : Learning Efficient Routes from Encounter Histories Alone
M. Grossglauser and M. Vetterli
IEEE/ACM Transactions on Networking, 14(3), p. 457- 469

This paper deals with last encounter routing (LER), a kind of protocol in which location information for nodes is essentially computed “on the fly” and there is no need to disseminate a location table and update it. They consider a grid toplogy (a torus, actually) on which nodes do a simple random walk. The walk time is much slower than the transmission time, so at any moment the topology is frozen with respect to a single packet transmission. Every node i maintains a table of pairs (Pij, Aij) for each other node j where P is its last known position and A is the age of that information. In the Exponential Age SEarch protocol (EASE) and GReedy EASE (GREASE), a packet keeps in its header an estimate of the position of its destination, and rewrites that information when it meets a node with a closer and more recent estimate. Because the location information and mobility processes are local, these schemes performs order-optimally (but with worse constants) to routing with location information, and with no overhead in the cost.

In particular, EASE computes a sequence of anchors for the packet, each one of has an exponentially closer estimate of the destination in both time and space. For example, each anchor could halve the distance to the destination as well as the age of that location estimate. This is similar in spirit to Kleinberg’s small world graphs paper, in which routing in a small world graph halves the distance, leading to a log(n) routing time, which comes from long hops counting the same as local hops. Here long hops cost more, so you still need n1/2 hops. The paper comes with extensive simulation results to back up the analysis, which is nice.

What is nicer is the intuition given at the end:

Intuitively, mobility diffusion exploits three salient features of the node mobility processes: locality, mixing, and homogeneity…
Locality ensure[s] that aged information is still useful… Mixing of node trajectories ensures that position information diffuses around this destination node… Homogeneity ensure[s] that the location information spreads at least as fast as the destination moves.

The upshot is that location information is essentially free in an order sense, since the mobility process has enough memory to guarantee that suboptimal greedy decisions are not too suboptimal.

paper a day : power laws for white/gray matter ratios

I swear I’ll post more often starting soon — I just need to get back into the swing of things. In the meantime, here’s a short but fun paper.

A universal scaling law between gray matter and white matter of cerebral cortex
K. Zhang and T. Sejnowski
PNAS v.97 no. 10 (May 9, 2000)

This paper looks at the brain structure of mammals, and in particular the volumes of gray matter (cell bodies, dendrites, local connections) and white mattern (longer-range inter-area fibers). A plot of white matter vs. gray matter volumes showing different mammals, from a pygmy shrew to an elephant, show a really close linear fit on a log-log scale, with the best line having a slope of log(W)/log(G) = 1.23. This paper suggests that the exponent can be explained mathematically using two axioms. The first is that a piece of cortical area sends and receives ths same cross-sectional area of long-range fibers. The second more important axiom is that the geometry of the cortex is designed to minimize the average length of the long-distance fibers.

By using these heuristics, they argue that an exponent of 4/3 is “optimal” with respect to the second criterion. The difference of 0.10 can be explained by the fact that cortical thickness increases with the size of the animal, so they regressed cortical thickness vs. log(G) to get a thickness scaling of 0.10. It’s a pretty cute analysis, I thought, although it can’t really claim that minimum wiring is a principle in the brain so much as the way brains are is consistent with minimal wiring. Of course, I don’t even know how you would go about trying to prove the former statement — maybe this is why I feel more at home in mathematical engineering than I do in science…

paper a day : an LP inequality for concentration of measure

A Linear Programming Inequality with Applications to Concentration of Measure
Leonid Kontorovich
ArXiV: math.FA/0610712

The name “linear programming” is a bit of a misnomer –it’s not that Kontorovich comes up with a linear program whose solution is a bound, but more that the inequality relates two different norms, and the calculation of one of them can be thought of as maximizing an inner product over a convex polytope, which can be solved as a linear program. This norm inequality is the central fact in the paper. There are two weighted norms on functions at work — one is defined via a recursion that looks a lot like the sum of martingale differences. The other is maximizing the inner product of the given function with functions in a polytope defined by the weights. All functions in the polytope have bounded Lipschitz coefficient. By upper bounding the latter norm with the former, he can apply the former to the Azuma/Hoeffding/McDiarmid inequalities that show measure concentration for functions with bounded Lipschitz coefficient.

On a more basic level, consider a function of many random variables. For example, the empirical mean is such a function. A bounded Lipschitz coefficient means they are “insensitive” to fluctuations in one value. This intuitively means that as we add more variables the function will not deviate from is mean value by very much. To show this, you need to bound a probability with the Lipschitz coefficient, which is exactly what this inequality is doing. Kontorovitch thereby obtains a generalization of the standard inequalities.

What I’m not sure about is how I would ever use this result, but that’s my problem, really. The nice thing about the paper is that it’s short, sweet, and to the point without being terse. That’s always a plus when you’re trying to learn something.

paper a day : periodic bloodsucking rates for vampires

Alex forwarded me this reference:

CYCLES OF FEAR: PERIODIC BLOODSUCKING RATES FOR VAMPIRES
R. F. Hartl, A. Mehlmann, and A. Novak
Journal of Optimization Theory and Applications, Vol.75 No. 3, 1992

Abstract:
In this paper, we present a new approach for modeling the dynamic intertemporal confrontation between vampires and humans. It is assumed that the change of the vampiristic consumption rate induces costs and that the vampire community also derives some utility from possessing humans and not only from consuming them. Using the Hopf bifurcation theorem it can be shown that cyclical bloodsucking strategies are optimal. These results are in accordance with empirical evidence.

Keywords: maximum principle; limit cycles; economics of human resources; vampire myth

To the feather-fool and lobcock, the pseudo-scientist and materialist, these deeper and obscurer things must, of course, appear a grandam’s tale.
Montague Summers. The Vampire in Europe

This paper analyzes a mathematical model of bloodsucking rates for vampires using control theory. However, as they note in the introduction,

To a traditional vampirologist, the use of optimal control theory against vampires, as exercised in Ref. 6, seems highly questionable. This is due to the fact that the application of Pontryagin’s principle requires the derivation of a shadow price for vampires. Such a price is, however, nonexistent since vampires do not have a shadow.

As a predator-prey scenario, we can model the dynamics of the population using some differential equations. The problem for the vampires is to set a bloodsucking rate (humans per vampire) so as to maximize a utility function subject to the dynamics. However, the model has to be made more sophisticated to account for the cyclical bloodsucking patterns found in real vampires. The modifications are twofold — firstly, vampires also derive some utility from posessing humans rather than just sucking blood from them, and secondly, changing the consumption rate penalizes the utility. So in this push-and-pull framework they can derive some cycles, appropriately named “cycles of fear” in which the bloodsucking rate is modulated over time to achieve a stable tradeoff and net utility.

The full version, which is not to be missed, can be found via SpringerLink.
For some earlier comments on the optimal destruction of vampires and macroeconomic policy (which involves the shadow price), see this related JSTOR article.

My research is waaaaaay too boring.

paper a day : importance sampling in rare-event simulations

Introduction to importance sampling in rare-event simulations
Mark Denny
European Journal of Physics, 22 (2001) : 403–411

This paper is about importance sampling (IS), which a method to improve the error behavior of Monte Carlo (MC) methods. In engineering systems, getting good simulation results for rare events (such as decoding error) on the order of 10-10 would require an obscene amount of computation if you just did things the naive way. For example, the quality of a numerical bound on the tail probability of a random variable gets worse and worse as you look farther and farther out. Importance sampling is a method of reweighting the distribution to either get a smaller error in the regime of interest and/or uniformize the estimation error. This paper gives some motivation, a simple IS algorithm, analysis, and some simulations. It’s pretty readable, and I went from knowing nothing about importance sampling to getting a decent idea of how to use it in practice, along with its potential problems and benefits.

paper a day : Fountain Capacity

This is one of the papers I saw at ISIT. At the time I thought I understood it, but reading the actual paper made it much clearer.

Fountain Capacity
S. Shamai, E. Telatar, and S. Verdú
Proc. Int’l Symp. on Information Theory

The background for this paper is fountain codes. The basic idea is pretty simple. Suppose you have k packets of data that you want to send. Then one way of sending them is to pick a random subset of the packets, XOR them, and send that, along with some header information about which packets are in the XOR. This is just a linear combination of the packets, so a receiver getting a sufficient number of coded packets will be able to invert the set of linear equations to get the remaining data. The encoder acts like a fountain spewing out coded packets. Suppose the packets go over an erasure channel, for which packets either arrive intact or are erased. If there are many users with varying link qualities, the ones with better links will be able to decode early and disconnect. If a user connects in the middle, they don’t lose anything from missing the “earlier” coded packets. All that matters is that each user/decoder gets enough unerased packets to enable them to decode.

The purpose of this paper is to see if and how this decoder-centric view of communication can be put on a more traditional information-theoretic (read : Shannon Theory) foundation. The idea is that the encoder will take M messages and encode them into infinitely long strings. The encoding is randomized, and the randomization is known to the decoder. The encoded symbols go through a channel, but the decoder is pay-per-view : it can only see n outputs of the channel, but it can’t control which n outputs. An adversary, without knowing which codebook is being used, picks the schedule of n outputs observed by the decoder. The probability of decoding error is then given by the worst case over subsets of n outputs. The capacity is the supremum of rates for which the decoding error can be driven to 0.

The main results of the paper are

  • Randomized coding is necessary, and an infinite amount of randomness is required.
  • Fountain capacity is always upper bounded by Shannon capacity.
  • For stationary memoryless channels the fountain capacity is equal to the Shannon capacity.
  • For channels with memory, the fountain capacity may be less than Shannon capacity.

What’s a little unsatisfying about this paper is precisely the same thing that makes it interesting — the channel model they’re using is a slippery one, so it’s difficult to see how “reasonable” it is. For example, I think the infinite randomization thing is a real drawback, but maybe the channel assumptions can be relaxed to take that away. I’m interested in this paper because it is very related to arbitrarily varying channels (AVCs), which is what my thesis will likely revolve around. There are some comments at the end which relate it to AVCs, but I should think a little bit more about the real underlying problem and how to connect it back to AVCs.

paper a day (month?) : isomap

A Global Geometric Framework for Nonlinear Dimensionality Reduction
J. B. Tenenbaum, V. de Silva, J. C. Langford
Science, v290, p2319 – 2323, 22 December 2000.

I have to present this paper for a computational neuroscience reading group that I can (finally) attend. The basic problem is something called manifold learning. Imagine you have a very large data set in a huge number of dimensions — for example, 64 x 64 pixel images of faces, which live in a 4096-dimensional space. Furthermore, suppose that all of the pictures are of the same person, with only two parameters changing — the angle of rotation of the face, and the illumination. The data has only two degrees of freedom, so you would think it would live on a 2-dimensional subspace.

Unfortunately, the data you have doesn’t occupy a linear subspace in the observed variables. Instead, they live on a manifold, which is like a surface in your high dimensional space that may be strangely curved or twisted, and so may be very poorly approximated by a linear subspace. However, the manifold does have its own coordinate system, and you can
calculate distances between points on the manifold. The shortest path between two points is called a geodesic.

Another way to visualize this is a ribbon that is curled into a spiral. A point A on the ribbon might look close to a point B that is on an outer ring, but if you unwrapped the ribbon they would be far apart. So one thing you might think to do is somehow figure out what the real distance (on the surface of the ribbon) between A and B is. You might be able to do that by somehow hopping from data-point to data-point in short hops that would hopefully follow the contour of the ribbon.

That is exactly what the Isomap algorithm, described in this paper, does to perform its dimensionality reduction. Given a set of points {xk : k = 1, 2,…, K} in n-dimensional space X, they first make a graph G with vertex set {xk} by putting an edge between two points if the distance between them in X is less than some threshold. The edge has a weight given by the distance. Then they find a K x K matrix D whose (i,j)-th entry is the minimum-weight (distance) path in the graph G between xi and xj. Assuming the threshold is not too large and there are a lot of data points, these lengths should closely approximate the true (geodesic) distance on the manifold.

Armed with this matrix of distances between points, we can try to embed the manifold into a d-dimensional Euclinean space without distorting the distances too much. This is an easy problem to visualize — just imagine taking a globe, fixing a few cities, and trying to make a flat map in which the distances between those cities are preserved. There is an algorithm called multidimensional scaling (MDS) that can do this. You can trade off the embedding distortion with the number of dimensions.

This paper comes with its own homepage, which has some data sets and MATLAB code. If only all practitioners were so generous — too often the algorithm implementation is kept under wraps, which makes me wonder if there are some dirty secrets hiding behind the pretty plots.

One thing about reading this paper that annoyed me is that all of the technical details (which I care about) are hidden in tiny-print footnotes. Furthermore, all the citations do not include the paper titles, so you can’t tell cited papers are actually about. I know that page space is precious, but it’s just plain stupid. Shame on you, Science. I expected better.

As as amusing postscript, the commentary on this paper and the locally linear embedding paper (Roweis and Saul) written by Seung and Lee has pictures of Bush and Gore in the print edition but due to copyright issues the online version had to be changed.