Johnson-Lindenstrauss, Restricted Isometry (RIP), for-each vs for-all, and all that

The Johnson-Lindenstrauss lemma shows how to project points in much lower dimension without significant distortion of their pairwise Euclidean distances.

Say you have n arbitrary vectors p_1, p_2 \ldots, p_n in \mathbb{R}^n.
Use a fat random (normalized) Gaussian matrix of dimensions m \times n where

m= \frac{c}{\epsilon^2} \log n,

to project the points in m dimensions.

Then, with probability at least 1/2, the projected points A p_1, A p_2, \ldots A p_n have pairwise distances that satisfy:

(1-\epsilon) ||p_i -p_j ||_2 \leq ||A p_i - A p_j ||_2 \leq (1+\epsilon) ||p_i -p_j||_2.

This is quite remarkable since we reduced the dimension from n into \log n.

This is a very fat matrix.

A number of other matrices beyond Gaussian have been shown to also work with similar guarantees.

The Johnson-Lindenstrauss lemma sounds similar to the Restricted Isometry Property (RIP).
A fat m \times n matrix A has the RIP property if

(1-\epsilon) ||x||_2 \leq ||A x ||_2 \leq ||(1+\epsilon) ||x||_2,

for all vectors x that are k-sparse, i.e. ||x||_0 \leq k.

This means that the matrix A does not distort the \ell_2 norm of sparse vectors.
As before, well-known results have shown that random Gaussian matrices satisfy RIP with high probability if we set
m = \Theta( k \log(n/k)).

To understand the similarities and differences between RIP and JL, we need to see how these results are proven. The central question is how fat random matrices distort the length of vectors and with what probability.

 

All known proofs of JL (for each matrix family) start with a lemma that deals with one fixed vector:
Lemma 1:

For each given vector x, the matrix A does not distort the length too much, with good probability:

Pr[ (1-\epsilon) ||x||_2 \leq ||Ax||_2 \leq (1+\epsilon) ||x||_2] \geq 1- P_{fail}(m,n) \quad (1)

The key issue is between for all versus for each vector quantifiers.
If we first fix the vector and throw the matrix randomly, this is much easier than promising something for a random matrix that once realized, works w.h.p. for all vectors.

Clearly, if first realize the matrix, there are certainly vectors where lemma 1 cannot hold: since the matrix is fat, it has a nullspace so there are nonzero vectors that have ||Ax||=0. Lemma 1 simply says that this is unlikely and gives a `for each vector, a random matrix is good whp’ guarantee.

How small we can make P_{fail} depends crucially on how small we make m. Since we want to achieve dimensionality reduction we want to make the matrix as fat as possible, i.e. m as small as possible. But P_{fail} will become larger.

To prove JL for some family of matrices we need to make P_{fail}\leq 1/n^2.

If we have lemma 1 with this guarantee, we just do a union bound over all {n \choose 2}= \Theta(n^2) pairs of vectors and obtain the pairwise distance distortion guarantee for all vectors p_1, p_2, \ldots p_n. Hence JL is Lemma 1 + union bound over all pairs.

It turns out that for Gaussian iid matrices, if you make m=\frac{1}{\epsilon^2} \log n, the probability of error is smaller than P_{fail} \leq 1/n^2. This follows by the wonderful symmetries of the Gaussian: ||Ax ||_2 has the same distribution as ||A e_1 ||_2 i.e. the \ell_2 norm of the first column of A. Therefore ||Ax ||^2_2 has a Chi-square distribution with m degrees of freedom and standard concentration results suffice to bound P_{fail}.

To prove RIP, exactly the same recipe can be followed: first prove Lemma 1 for one vector and then do a union bound over all {n \choose k} k-sparse vectors. The problem is now we have a much larger union bound and therefore we need P_{fail} \leq \frac{1}{{n \choose k}}. This enforces a much larger m compared to JL.

looking at a bipartite expander

The general recipe emerges. Say you want to prove a good property for all hairy vectors.

First show that for each fixed vector a random matrix has the good property with probability at least 1- P_{fail}.

Then ensure P_{fail} \leq \frac{1}{\text{No of hairy vectors}}. This will limit how much dimensionality reduction you can achieve.

Finally, a union bound establishes the property for all hairy vectors.

One thought on “Johnson-Lindenstrauss, Restricted Isometry (RIP), for-each vs for-all, and all that

  1. Is a hairy vector a cowlick in the hairy ball theorem (the only technical thing that came up when googling “hairy”) or is it just a troublesome exception that fails?

Comments are closed.