A Global Geometric Framework for Nonlinear Dimensionality Reduction

J. B. Tenenbaum, V. de Silva, J. C. Langford

Science, v290, p2319 – 2323, 22 December 2000.

I have to present this paper for a computational neuroscience reading group that I can (finally) attend. The basic problem is something called *manifold learning*. Imagine you have a very large data set in a huge number of dimensions — for example, 64 x 64 pixel images of faces, which live in a 4096-dimensional space. Furthermore, suppose that all of the pictures are of the same person, with only two parameters changing — the angle of rotation of the face, and the illumination. The data has only two degrees of freedom, so you would think it would live on a 2-dimensional subspace.

Unfortunately, the data you have doesn’t occupy a *linear subspace* in the observed variables. Instead, they live on a manifold, which is like a surface in your high dimensional space that may be strangely curved or twisted, and so may be very poorly approximated by a linear subspace. However, the manifold does have its own coordinate system, and you can

calculate distances between points on the manifold. The shortest path between two points is called a *geodesic*.

Another way to visualize this is a ribbon that is curled into a spiral. A point *A* on the ribbon might look close to a point *B* that is on an outer ring, but if you unwrapped the ribbon they would be far apart. So one thing you might think to do is somehow figure out what the real distance (on the surface of the ribbon) between *A* and *B* is. You might be able to do that by somehow hopping from data-point to data-point in short hops that would hopefully follow the contour of the ribbon.

That is exactly what the Isomap algorithm, described in this paper, does to perform its dimensionality reduction. Given a set of points *{x*_{k} : k = 1, 2,…, K} in *n*-dimensional space *X*, they first make a graph *G* with vertex set *{x*_{k}} by putting an edge between two points if the distance between them in *X* is less than some threshold. The edge has a weight given by the distance. Then they find a *K x K* matrix *D* whose *(i,j)*-th entry is the minimum-weight (distance) path in the graph *G* between *x*_{i} and *x*_{j}. Assuming the threshold is not too large and there are a lot of data points, these lengths should closely approximate the true (geodesic) distance on the manifold.

Armed with this matrix of distances between points, we can try to embed the manifold into a *d*-dimensional Euclinean space without distorting the distances too much. This is an easy problem to visualize — just imagine taking a globe, fixing a few cities, and trying to make a flat map in which the distances between those cities are preserved. There is an algorithm called multidimensional scaling (MDS) that can do this. You can trade off the embedding distortion with the number of dimensions.

This paper comes with its own homepage, which has some data sets and MATLAB code. If only all practitioners were so generous — too often the algorithm implementation is kept under wraps, which makes me wonder if there are some dirty secrets hiding behind the pretty plots.

One thing about reading this paper that annoyed me is that all of the technical details (which I care about) are hidden in tiny-print footnotes. Furthermore, all the citations **do not include the paper titles**, so you can’t tell cited papers are actually about. I know that page space is precious, but it’s just plain stupid. Shame on you, *Science*. I expected better.

As as amusing postscript, the commentary on this paper and the locally linear embedding paper (Roweis and Saul) written by Seung and Lee has pictures of Bush and Gore in the print edition but due to copyright issues the online version had to be changed.