# The things we know we don’t know

As a theoretical engineer, I find myself getting lulled into the trap of what I now starting to call “lazy generalization.” It’s a form of bland motivation that you often find at the beginning of papers:

Sensor networks are large distributed collections of low-power nodes with wireless radios and limited battery power.

Really? All sensor networks are like this? I think not. Lots of sensor networks are wired (think of the power grid) but still communicate wirelessly. Others communicate through wires. This is the kind of ontological statement that metastasizes into the research equivalent of a meme — 3 years after Smart Dust appears, suddenly all papers are about dust-like networks, ignoring the vast range of other interesting problems that arise in other kinds of “sensor networks.”

Another good example is “it is well known that most [REAL WORLD THING] follows a power law,” which bugs Cosma to no end. We then get lots of papers papers which start with something about power laws and then proceed to analyze some algorithms which work well on graphs which have power law degree distributions. And the later we get statements like “all natural graphs follow power laws, so here’s a theory for those graphs, which tells us all about nature.”

Yet another example of this is sparsity. Sparsity is interesting! It lets you do a lot of cool stuff, like compressed sensing. And it’s true that some real world signals are approximately sparse in some basis. However, turn the crank and we get papers which make crazy statements approximately equal to “all interesting signals are sparse.” This is trivially true if you take the signal itself as a basis element, but in the way it’s mean (e.g. “in some standard basis”), it is patently false.

So why is are these lazy generalization? It’s a kind of fallacy which goes something like:

1. Topic A is really useful.
2. By assuming some Structure B about Topic A, we can do lots of cool/fun math.
3. All useful problems have Structure B

Pattern matching, we get A = [sensor networks, the web, signal acquisition], and B = [low power/wireless, power laws, sparsity].

This post may sound like I’m griping about these topics being “hot” — I’m not. Of course, when a topic gets hot, you get a lot of (probably incremental) papers all over the place. That’s the nature of “progress.” What I’m talking about is the third point. When we go back to our alabaster spire of theory on top of the ivory tower, we should not fall into the same trap of saying that “by characterizing the limits of Structure B I have fundamentally characterized Topic A.” Maybe that’s good marketing, but it’s not very good science, I think. Like I said, it’s a trap that I’m sure I’m guilty of stepping into on occasion, but it seems to be creeping into a number of things I’ve been reading lately.

## 7 thoughts on “The things we know we don’t know”

1. Kris says:

I feel like this is the result of one of two things. Either a) the person has been doing the theoretical work for so long, they forgot the assumptions they made in the beginning and treat the work as self-evident, or b) the person realized when they went to write the paper that they need specific real world examples to apply their theory to, and in a fit of grandeur they expand their work to cover as much as possible.

• Yes, I agree. I think there’s a push from the other side too — theorists are always told that their theory must be rooted in the application, which leads to this incentive to make it seem like a theorem which is mathematically interesting and beautiful must also be “useful” in a more direct sense. So there’s a rush to put some lipstick of practicality on the pig of theory. But pigs can be pretty on their own!

2. Kamalika says:

Your post reminded me of the famous Rumsfeld quote:

“There are things we know that we know. There are known unknowns. That is to say there are things that we now know we don’t know. But there are also unknown unknowns. There are things we do not know we don’t know.”