Last week I was reading Active Learning via Perfect Selective Classification by El-Yaniv and Wiener, and came across a neat result due to Hug and Reitzner that they use in some of their bounds for active learning on Gaussian distributions.

The setup is the following : let $X_1, X_2, \ldots, X_n$ be $n$ jointly Gaussian vectors with distribution $\mathcal{N}(0,I_d)$ in $\mathbb{R}^d$. The convex hull $P_n$ of these points is called a Gaussian polytope. This is a random polytope of course, and we can ask various things about their shape : what is the distribution of the number of vertices, or the number of $k$-faces? Let $f_k(P_n)$ be the number of $k$-faces Distributions are hard, but for general $k$ the expected number of faces (as $n \to infty$) is given by

$\mathbb{E}[ f_k(P_n)] = \frac{2^d}{\sqrt{d}} \binom{d}{k+1} \beta_{k,d-1}(\pi \ln n)^{(d-1)/2} (1 + o(1))$,

where $\beta_{k,d-1}$ is the internal angle of a regular $(d-1)$-simplex at one of its $k$-dimensional faces. What Hug and Reitzner show is a bound on the variance (which then El-Yaniv and Plan use in a Chebyshev bound) : there exists a constant $c_d$ such that

$\mathrm{Var}( F_k(P_n) ) \le c_d (\ln n)^{(d-1)/2}$

So the variance of the number of $k$-faces can be upper bounded by something that does not depend at all on the actual value of $k$. In fact, they show that

$f_k(P_n) (\ln n)^{-(d-1)/2} \to \frac{2^d}{\sqrt{d}} \binom{d}{k+1} \beta_{k,d-1} \pi^{(d-1)/2}$

in probability as $n \to \infty$. That is, appropriately normalized, the number of faces converges to a constant.

To me this result was initially surprising, but after some more thought it makes a bit more sense. If you give me a cloud of Gaussian points, then I need $k+1$ points to define a $k$-face. The formula for the mean says that if I chose a random set of $k+1$ points, then approximately $\frac{2^d}{\sqrt{d}} \beta_{k,d-1}(\pi \ln n)^{(d-1)/2}$ fraction of them will form a real $k$-face of the polytope. This also explains why the simplex-related quantity appears — points that are on “opposite sides” of the sphere (the level sets of the density) are not going to form a face together. As $n \to \infty$ this fraction will change, but effectively the rate of growth in the number of faces with $n$ is $(\ln n)^{(d-1)/2}$, regardless of $k$.

I’m not sure where this result will be useful for me (yet!) but it seemed like something that the technically-minded readers of the blog would find interesting as well.

I’ve been trying to get a camera-ready article for the Signal Processing Magazine and the instructions from IEEE include the following snippet:

*VERY IMPORTANT: All source files ( .tex, .doc, .eps, .ps, .bib, .db, .tif, .jpeg, …) may be uploaded as a single .rar archived file. Please do not attempt to upload files with extensions .shs, .exe, .com, .vbs, .zip as they are restricted file types.

While I have encountered .rar files before, I was not very familiar with the file format or its history. I didn’t know it’s a proprietary format — that seems like a weird choice for IEEE to make (although no weirder than PDF perhaps).

What’s confusing to me is that ArXiV manages to handle .zip files just fine. Is .tgz so passé now? My experience with RAR is that it is good for compressing (and splitting) large files into easier-to-manage segments. All of that efficiency seems wasted for a single paper with associated figures and bibliography files and whatnot.

I was trying to find the actual compression algorithm, but like most modern compression software, the innards are a fair bit more complex than the base algorithmic ideas. The Wikipedia article suggests it does a blend of Lempel-Ziv (a variant of LZ77) and prediction by partial matching, but I imagine there’s a fair bit of tweaking. What I couldn’t figure out is if there is a new algorithmic idea in there (like in the Burrows-Wheeler Transform (BWT)), or it’s more a blend of these previous techniques.

Anyway, this silliness means I have to find some extra software to help me compress. SimplyRAR for MacOS seems to work pretty well.

Learning from transcriptomes can be cheaper for organisms which have never been sequenced.

A fancy Nature article on mobility privacy, in case you weren’t convinced by other studies on mobility privacy.

Bad statistics in neuroscience. Color me unsurprised.

I bet faked results happen a lot in pharmaceutical trials, given the money involved. Perhaps we should jail people for faking data as a disincentive?

The Atheist shoe company did a study to see if the USPS was discriminating against them.

I’ve been on some flights lately and skived off of work to read a bit more.

The White Tiger (Aravind Adiga) — a farce told from the perspective of a murderer-turned entrepreneur in Bangalore writing letters to Wen Jiabao. I think there are definitely some interesting issues here especially with Adiga trying to write the voice of the subaltern. The point of the book seems to be to skewer the rich in India (and by implication the middle class which seeks to emulate the rich) but I’m not sure if the hits land where they are targeted. Definitely worth reading and discussing if you care about India. People who have never been there may find it less… familiar, and so their reading experience would be quite different.

Interworld (Neil Gaiman and Michael Reaves) — a Young Adult science fiction/fantasy novel. A bit of a thin premise, world-building-wise, but a breezy read. Can’t really recommend it but it was ok.

Rule 34 (Charles Stross) — a follow-up to Halting State. Set in future-Scotland and has all of the techno-econo-conspiracy together with some interesting takes on the effect of how ubiquitous internet and custom-3D printing and fabbing can affect life.

A Man of Misconceptions (John Glassie) — a fascinating biography of Athanasius Kircher, whose fascinatingly incorrect “scholarship” makes for some enjoyable reading. Glassie’s book is a really engaging read and brings a lot of the context of Kircher’s world to life. Highly recommended.

I’m happy to announce that in January 2014 I will be joining the Electrical and Computer Engineering Department at Rutgers, The State University of New Jersey, as an Assistant Professor.

Endless Things [John Crowley] — Book four of the Aegypt Cycle, and the one most grounded in the present. The book moves more swiftly than the others, as if Crowley was racing to the end. Many of the concerns of the previous books, such as magic, history, and memory, are muted as the protagonist Pierce Moffett wends his way through his emotional an intellectual turmoil and into what in the end amounts to a kind of peace. Obviously only worth reading if you read the first three books.

Understanding Privacy [Daniel Solove] — A law professor’s take on what constitutes privacy. Solove wants to conceptualize privacy in terms of clusters of related ideas rather than take a single definition, and he tries to put a headier philosophical spin on it by invoking Wittgenstein. I found the book a bit overwritten but it does parse out the things we call privacy, especially in the longest chapter on the taxonomy of privacy. It’s not a very long book, but it has a number of good examples and also case law to show how muddled our legal definitions have become. He also makes a strong case for increased protections and shows how the law is blind to the effects of information aggregation, for example.

The Fall of the Stone City [Ismail Kadare] — An allegorical novel by a Man Booker prize winner chronicling the Nazi occupation and the communist takeover of Gjirokaster, an old Albanian city. It’s a dark absurdist comedy, partly in the vein of Kafka but with a bit of… Calvino almost. The tone of the book (probably a testament to the translator) has this almost academic detachment, gently mocking as it describes the ways in which the victors try to rewrite history.

Invisible Men [Becky Pettit] — A sobering look at how mass incarceration interacts with official statistics. Because most surveys are household-based, they do not count the increasingly larger incarcerated population, thereby introducing a systematic racialized bias in the statistics used for public policy. In particular, Pettit shows how this bias leads to underestimation of racial inequity because the (mainly young black male) prisoners are “erased” in the official records.

The Rise of Ransom City [Felix Gilman] — A sequel to The Half-Made World, and a wondrously engrossing read it is too, filled with the clash of ideas, the corruption of corporations, the “borrowing” and evolution of ideas, and the ravages of industrialization. Also has a healthy dose of Mark Twain for good measure.

On Lalitha’s recommendation I read Frank Nielsen’s paper “Cramer-Rao Lower Bound and Information Geometry,” which is a survey how C.R. Rao’s work has impacted information geometry. I remember spending some time in grad school trying to learn information geometry (mostly for fun), but since it ended up not being particularly useful in my research, I’m afraid a lot of it has leaked out of my ears. This paper has a short introduction to the Cramer-Rao lower bound and an introduction to information geometry which might be a nice read for some of the readers of this blog. It’s certainly faster than trying to read Amari’s monograph! In particular, it goes over the “highlights” of geodesics and other geometric features on the manifold of probability distributions.

The paper mentions the sub-family of f-divergences known as $\alpha$-divergences, which are given by

$D_{\alpha}(p \| q) = \frac{4}{1 - \alpha^2} \left( 1 - \int p(x)^{(1 - \alpha)/2)} q(x)^{(1 + \alpha)/2} dx \right)$

The KL divergence is $D_{-1}(p \| q)$ — you have to take the limit as $\alpha \to -1$. Within this family of divergences we have the relation $D_{\alpha}(p \| q) = D_{-\alpha}(q \| p)$. Consider a pair of random variables $(X,Y)$ with joint distribution $P_{XY}$ and marginal distributions $P_X$ and $P_Y$. If we take $q = P_X P_Y$ and $p = P_{XY}$ then the mutual information is $D_{-1}( p \| q )$. But we can also take

$D_{-1}( P_{X} P_{Y} \| P_{XY}) = D_1( P_{XY} \| P_{X} P_{Y} )$

Thus it turns out that the “lautum information” defined by Palomar and Verdú is a special case of this: it’s the 1-divergence between the the joint distribution and the product of the marginals. While their paper mentions the lautum information is an f-divergence, it doesn’t discuss this connection to this family of divergences. Nielsen’s paper calls this the “reverse Kullback-Leibler divergence,” but some googling doesn’t seem to indicate that this is a common term, or indeed if it has some use in information geometry. Palomar and Verdú give several operational interpretations of the lautum information.