A few months ago I was home visiting my parents and we had a lunch with a few other Maharashtrians. The conversation turned towards food, and in particular ingredients that are important for making authentic garam masala. Garam masalas vary widely by region in India, and the two ingredients in question were dagadful and nag kesar. I had never really heard of these spices so I did a bit of research to learn more.

Dagadful (Parmelia perlata) is a lichen, not to be confused with the stone flower Didymocarpus pedicellatus, which is a plant that grows on rocks and is called charela in Hindi, I believe. The confusing thing is that both plants are used for herbal remedies, but the former is used for culinary purposes.

If you search for “nag kesar” you may find Mesua ferrea, a hardwood tree that grows in India and surrounds. That’s not where the spice comes from, however. This sparked the most debate at lunch, but I think I’ve figured out that the spice is the bud of a different tree, Mammea longifolia. Both Mesua and Mammea are in the family Calophyllaceae, which probably led to the name clash.

# Expected number of faces in Gaussian polytopes

Last week I was reading Active Learning via Perfect Selective Classification by El-Yaniv and Wiener, and came across a neat result due to Hug and Reitzner that they use in some of their bounds for active learning on Gaussian distributions.

The setup is the following : let $X_1, X_2, \ldots, X_n$ be $n$ jointly Gaussian vectors with distribution $\mathcal{N}(0,I_d)$ in $\mathbb{R}^d$. The convex hull $P_n$ of these points is called a Gaussian polytope. This is a random polytope of course, and we can ask various things about their shape : what is the distribution of the number of vertices, or the number of $k$-faces? Let $f_k(P_n)$ be the number of $k$-faces Distributions are hard, but for general $k$ the expected number of faces (as $n \to infty$) is given by

$\mathbb{E}[ f_k(P_n)] = \frac{2^d}{\sqrt{d}} \binom{d}{k+1} \beta_{k,d-1}(\pi \ln n)^{(d-1)/2} (1 + o(1))$,

where $\beta_{k,d-1}$ is the internal angle of a regular $(d-1)$-simplex at one of its $k$-dimensional faces. What Hug and Reitzner show is a bound on the variance (which then El-Yaniv and Plan use in a Chebyshev bound) : there exists a constant $c_d$ such that

$\mathrm{Var}( F_k(P_n) ) \le c_d (\ln n)^{(d-1)/2}$

So the variance of the number of $k$-faces can be upper bounded by something that does not depend at all on the actual value of $k$. In fact, they show that

$f_k(P_n) (\ln n)^{-(d-1)/2} \to \frac{2^d}{\sqrt{d}} \binom{d}{k+1} \beta_{k,d-1} \pi^{(d-1)/2}$

in probability as $n \to \infty$. That is, appropriately normalized, the number of faces converges to a constant.

To me this result was initially surprising, but after some more thought it makes a bit more sense. If you give me a cloud of Gaussian points, then I need $k+1$ points to define a $k$-face. The formula for the mean says that if I chose a random set of $k+1$ points, then approximately $\frac{2^d}{\sqrt{d}} \beta_{k,d-1}(\pi \ln n)^{(d-1)/2}$ fraction of them will form a real $k$-face of the polytope. This also explains why the simplex-related quantity appears — points that are on “opposite sides” of the sphere (the level sets of the density) are not going to form a face together. As $n \to \infty$ this fraction will change, but effectively the rate of growth in the number of faces with $n$ is $(\ln n)^{(d-1)/2}$, regardless of $k$.

I’m not sure where this result will be useful for me (yet!) but it seemed like something that the technically-minded readers of the blog would find interesting as well.

/