Predicting insurgent attacks with power laws, oh my

A while back I heard this NPR story about a method for predicting attacks that was published in Science. The news story left me a bit perplexed — where was the prediction? It turns out to be yet another version of attack of the power laws. After a bit of curve fitting, the meat of the thing seems to be this:

Our broad-brush theory does not require knowledge of specific adaptation or counteradaptation mechanisms, and hence bypasses issues such as changes in insurgent membership, technology, learning, or skill set, as well as a need to know the hearts and minds of local residents. We regard the escalation of hostilities as representing adaptation and counteradaptation in a way that is analogous to the Red Queen hypothesis of evolutionary biology. (Johnson et al. 2011)

Hearts and minds : pish-posh, what nonsense. Bring on the Red Queen!

This continual arms race of adaptation and counter-adaptation suggests similarities with Darwinian selection in nature. As noted by Rafe Sagarin, “a fundamental tenet of evolutionary biology is that organisms must constantly adapt just to stay in the same strategic position relative to their enemies—who are constantly changing as well. For example, to protect its DNA against viruses, a host organism must continually change the access code to its genetic material” (Sagarin, 2003, p. 69). Meeting the Red Queen in Alice in Wonderland, Alice finds that however fast she runs, she always stays in the same place. The “Red Queen” concept has become widely used in evolutionary biology to describe how competing individuals can become locked in an “arms race” of strategies, machinery, or weapons. However impressive one side becomes, it may never come any closer to defeating its opponent. (Johnson 2009)

Unfortunately, Red-Queen like equilibrium is not possible. Since they’ve already thrown out the idea of modeling, they’re left with fitting parameters in a stochastic model:

However, instantaneous and perfect counteradaptation is unrealistic; indeed, complex adaptation-counteradaptation dynamics generated by sporadic changes in circumstances imply that R’s temporal evolution is likely to be so complex that it can be modeled as a stochastic process. We do not need to know exactly why changes at any specific moment, nor do the changes in have to have the same value, because each change is the net result of a mix of factors [such as learning by experience or changes in personnel and technology] for each opponent. (Johnson et al. 2011)

They they go on to say that once you fit the parameters you could estimate momentum from a few fatal days to estimate the timing of the next fatal day. But they don’t actually do this (say, by holding out some data and doing a little validation), except an example of a single prediction being more accurate than one might expect.

So in the end, what do we have? This is a paper about modeling and not about prediction, but prediction sounds a lot more sexy, and so NPR ran with that aspect of it. In order to build a predictor, you would need to build a good model, which presumably means sitting around and waiting for fatal attacks to happen because you can’t make an omelette predictive model without breaking some eggs killing some people. Finally, “ZOMG P0W3R L4WZ” is so the aughts.

HealthSec 2011

I also attended HealthSec ’11 this week, and the program was a little different than what I had expected. There was a mix of technical talks and policy/framework proposals around a couple of themes:

  • security in medical devices
  • auditing in electronic medical records
  • medical record dissemination and privacy

In particular, a key challenge in healthcare coming up is how patient information is going to be handled in heath insurance exchanges (HIE’s) that will be created as part of the Affordable Care Act. The real question is what is the threat model for health information : hackers who want to wholesale health records, or the selling of data by third parties (e.g. insurance companies). Matthew Green from Dartmouth discussed implications of the PCAST report on Health Information Technology, which I will have to read.

The most interesting part of the workshop was the panel on de-identification and whether it was a relevant or useful framework moving forward. The panelists were Sean Nolan from Microsoft, Kelly Edwards from University of Washington, Arvind Narayanan from Stanford, and Lee Tien from the EFF. Sean Nolan talked a bit about how HIPAA acts as an impediment to exploratory research, which I have worked on a little, but also raised the thorny ethical issue of public good versus privacy, which is key to understanding the debate over health records in clinical research. Edwards is a bioethicist and had some very important points to raise about how informed consent is an opportunity to educate patients about their (potential) role in medical research, but also to make them feel like informed participants in the process. The way in which we phrase the tradeoff Nolan mentioned really relates to ethics in how we communicate the tradeoff to patients. Narayanan (famous for his Netflix deanonymization) talked about the relationship between technology and policy has to be rethought or turned more into a dialogue rather than a blame-shifting or challenge-posing framework. Lee Tien made a crucial point that if we do not understand how patient data moves about in our existing system, then we have no hope of reform or regulation, and no stakeholder in the system how has that “bird’s eye view” of these data flows.

I hope that in the future I can contribute to this in some way, but in the meantime I’ve been left with a fair bit to chew on. Although the conference was perhaps a bit less technical than I would have liked, I think it was quite valuable as a starting point for future work.

EVT / WOTE 2011

This week I attended EVT/WOTE ’11, a workshop on voting, technology, and trustworthiness co-located with the USENIX Security conference. I phased in and out of the workshop, which had a number of different session themes:

  • “E2E”, or end-to-end voting
  • empirical studies of real elections direct-recording electronic voting machines (DREs), forensics for them, and their impact on the reliability of election outcomes
  • studies of accessibility issues, either to polling places or for different voting technologies
  • new proposals for voting systems
  • auditing and metrics for existing election systems

I pretty much work on the last one, so while some of the panels were quite interesting, some technical talks were a little beyond me. Dana Debeauvoir, the Travis County Clerk (where Austin is) gave a keynote about how she thinks technologists and elections officials can work together, and rather bravely put forward a proposal for an electronic voting system to be used at the county level. There were lots of comments about that, of course.

Theron Ji gave a nice talk about how write-in marks are (or are not) properly counted by optical scan machines. People often forget to fill in the bubble saying they are doing a write-in, which can have pretty disastrous effects, as San Diego politician Donna Frye found out. Gillian Piner reported on a survey she did of vision-impaired voters, asking them what they want for accessibility technologies. Imagine that, asking them what they want!

The two talks of most interest to me were by David Cary and Tom Magrino, both on the margin of victory for IRV elections. Cary presented a method for estimating the margin based only on the tabulation of first-choices in each round, whereas Magrino presented an exact calculation that involved solving many integer linear programs. The scaling is not so great with the number of candidates (exponential), but for the kind of IRV elections we see in the US it was definitely doable. Margin calculations are important for developing auditing algorithms (on which I will write more once my paper is done). Philip Stark gave a plenary lecture on auditing which I missed part of due to a conflict with a parallel workshop.

There were also some interesting panels. The most contentious one was on internet voting, which I missed much of but the discussion went over by an hour so I think got the gist of it. Some people are afraid of voting over the internet, but the crypto people think it can be made safe. The panel on the the Sarasota House race in 2006 tried to hone in on the reason for the problems with undervotes in that contest. A lot can be explained by the design of the ballot, proving again that user interface and graphic design is really important!

The rump session was, as always, a mixture of amusing and technical and dry. The real highlight was probably David Bismark, who seems to have antagonized someone who has a new voting system involving moon projections. Wow.

California Elections Code on auditing

From Section 15360:

(a) During the official canvass of every election in which a
voting system is used, the official conducting the election shall
conduct a public manual tally of the ballots tabulated by those
devices, including vote by mail voters’ ballots, cast in 1 percent of
the precincts chosen at random by the elections official. If 1
percent of the precincts is less than one whole precinct, the tally
shall be conducted in one precinct chosen at random by the elections
official.

In addition to the 1 percent manual tally, the elections official
shall, for each race not included in the initial group of precincts,
count one additional precinct. The manual tally shall apply only to
the race not previously counted.

Additional precincts for the manual tally may be selected at the
discretion of the elections official.

Clearly this is not written by a statistician. Counting 1% of precincts “chosen at random” is hardly clear, and also doesn’t tell you too much about how many ballots you are going to count.

Banach spaces, super-reflexivity, and martingale type

I started reading Convex Games in Banach Spaces, by Sridharan and Tewari and then got interested in the work of G. Pisier on martingales in Banach spaces, which studies how probability and geometry interact. Some of the definitions differ a bit between these papers (the former does things in terms of martingale difference sequences).

A Banach space {\mathcal{B}} has Martingale-type {p} (or M-type {p}) if there is a constant {C} such that for any {T \ge 1} and {\mathcal{B}}-valued martingale {Y_1, Y_2, \ldots}, we have

\displaystyle  	\sup_{t} \mathop{\mathbb E}\left[ \left\| Y_t \right\| \right] \le C \left( \sum_{t=1}^{\infty} \mathop{\mathbb E}[ \left\| Y_t - Y_{t-1} \right\| ^p ] \right)^{1/p}.

A Banach space {\mathcal{B}} has Martingale-cotype {q} (or M-cotype {q}) if there is a constant {C} such that for any {T \ge 1} and {\mathcal{B}}-valued martingale {Y_1, Y_2, \ldots}, we have

\displaystyle  	 \left( \sum_{t=1}^{\infty} \mathop{\mathbb E}[ \left\| Y_t - Y_{t-1} \right\|^q ] \right)^{1/q} 	 \le 	 C \sup_{t} \mathop{\mathbb E}\left[ \left\| Y_t \right\| \right].

So far these are statements about how martingales behave in these spaces, and you can see a little geometry in the {(\sum (\cdot)^p )^{1/p}} kind of stuff, but how does this relate to overall geometric properties of the space?

A Banach space {\mathcal{B}} is reflexive if it is isomorphic to its double dual (the dual of its dual space). The space {\mathcal{B}} is reflexive if and only if its dual {\mathcal{B}^{\ast}} is reflexive. A space {\mathcal{C}} is finitely representable in {\mathcal{B}} if for any {x \in \mathcal{C}} and any {\lambda > 1} there exists an isomorphism {\phi : \mathcal{C} \rightarrow \mathcal{B}} such that

\displaystyle  	\lambda^{-1} \|x\| \le \|\phi(x)\| \le \lambda{x}

A Banach space {\mathcal{B}} is super-reflexive if no non-reflexive Banach space is finitely representable in {\mathcal{B}}. A result of James shows that a space is super-reflexive if and only if its dual is super-reflexive. Reflexivity means a space is “nice” and super-reflexivity is a kind of “uniform niceness.”

The norm for a Banach space is uniformly convex if for every {\epsilon > 0} there exists a {\delta > 0} such that for two unit vectors {x} and {y},

\displaystyle  	\|x + y\| > 2 - \delta

implies

\displaystyle  	\|x - y\| < \epsilon.

For example, for {p > 1} the spaces {L_p} and {\ell_p} are uniformly convex. Another way to picture it is that if the midpoint of the chord between {x} and {y} is close to the surface of the unit ball, then the chord has to be short. Per Enflo showed that if a Banach space {\mathcal{B}} is super-reflexive, it has an equivalent norm which is uniformly convex.

It turns out that the M-type is related to the degree of convexity of a space, namely how uniformly convex the norm is (and in turn connected to super-reflexivity). A space is {q}uniformly convex if

\displaystyle  	\left\| \frac{x + y}{2} \right\|^q \le \frac{1}{2} \left( \|x\|^q + \|y\|^q \right) 		- C^{-q} \left\| \frac{x - y}{2} \right\|^q.

Gilles Pisier made the connection to M-types : a Banach space is super-reflexive if and only if it has M-type {p > 1} (or M-cotype {q < \infty}). Furthermore, a Banach space has M-cotype {q} if and only if {\mathcal{B}} has a {q}-uniformly convex norm.

So the big picture is this: super-reflexivity is enough to guarantee a uniformly convex norm, but the M-cotype characterizes the degree of how convex that norm is.

References:

Sridharan, Kartik and Tewari, Ambuj, Convex Games in Banach Spaces, Proceedings of COLT, 2010.

Enflo, Per, Banach spaces which can be given an equivalent uniformly convex norm, Israel Journal of Mathematics 13 (1972), 281-288.

James, Robert C., Super-reflexive Banach spaces., Canadian Journal of Mathematics 24 (1972), 896Ð904.

Pisier, Gilles, Probabilistic Methods in the Geometry of Banach spaces, in G. Letta and M. Pratelli, ed., Probability and Analysis, v.1206 of Lecture Notes in Mathematics, p167–241, Springer, 1986.

Pisier, Gilles, Martingales with values in uniformly convex spaces, Israel Journal of Mathematics 20 (1975), no. 3-4, 326Ð350.

Pisier, Gilles, Martingales in Banach Spaces (in connection with Type and Cotype), IHP Winter Course online lecture notes, 2011.

Lovaglia, A. R., Locally uniformly convex Banach spaces, Trans. Amer. Math. Soc. 78, (1955). 225Ð238.

Readings

Shing-Tung Yau and Steve Nadis, The Shape of Inner Space — This book was about the Calabi conjecture, Calabi-Yau manifolds, string theory, and all that jazz. It’s supposed to be for a general/lay audience, but I found it rather daunting and often confusing. Perhaps I know just enough math to get confused, whereas other readers might gloss over things. I definitely would not recommend it to those without some serious mathematical background (like a few college classes). That being said, I found it pretty interesting, and now I know (kind of) what a Calabi-Yau space is.

Donald G. Saari, Decisions and Elections : Explaining the Unexpected — This sums up a large chunk of the analysis of social choice problems and voting systems done by Donald Saari. It’s a bit overwritten for my taste and veers between some mathematical formalism and a chatty form of argumentation. I don’t think I fit in the right “audience” for this book, which is part of the problem. It discusses Arrow’s Theorem and Sen’s Theorem via a bunch of examples and spends a fair bit of time on the “paradoxes” and perversities of different choice systems. The chattiness makes it feel less than systematic. Towards the end Saari puts on more of an advocate hat and argues that symmetry (in a particular sense) is a desirable property of election systems and puts out a case for the Borda count. That in a sense is the least convincing part of the book. This might be a good present for a precocious high school student, since the math is not so complicated but there are a lot of ideas to chew on in there.

Hannu Nurmi, Voting Procedures under Uncertainty — This also fits into the “slightly math-y books for political scientists” genre, so I found it insufficiently math-y. It’s a survey of different models of uncertainty in voting procedures and a survey of the work in that area. As such, it discusses alternatives to “traditional” social choice theory including Euclidean models of preference and so on. There’s a little more “survey” and less “integrating the different perspectives” but that’s ok. I am not sure who should read it, but it did point out some new literatures of which I had previously been unaware.

Moez Draif and Laurent Massoulié, Epidemics and Rumors in Complex Networks — A nice and compact introduction to rumor-spreading processes, including branching processes, small world graphs, SIS/SIR type models, and concluding with some models for “viral marketing.” I really liked this book because it was concise and to the point, but others may find that it lacks some context and connections to literature with which they are familiar. It doesn’t feel like a tutorial in that respect, but it’s self-contained and great for someone who has seen some of the material before but not all of it.

John Mortimer, Rumpole and the Primrose Path — Reading Rumpole short stories is kind of like relaxing in a pair of old slippers. Enjoyable, but probably not his best work.

Linkage

Via Jay P., a pretty amazing dance video.

Via 530nm330Hz, a very interesting tidbit on the history of the one-time pad. A free tech report version is available too. The one-time pad XOR’s the bits of a message with a i.i.d. random bitstring of the same length, and is credited to Gilbert Vernam and Joseph Mauborgne. However, as Steven Bellovin‘s paper shows,

In 1882, a California banker named Frank Miller published Telegraphic Code to Insure Privacy and Secrecy in the Transmission of Telegrams. In it, he describes the first one-time pad system, as a superencipherment mechanism for his telegraph code. If used properly, it would have had the same property of absolute security.

Although in theory Miller can claim priority, reality is more complex. As will be explained below, it is quite unlikely that either he or anyone else ever used his system for real messages; in fact, it is unclear if anyone other than he and his friends and family ever knew of its existence. That said, there are some possible links to Mauborgne. It thus remains unclear who should be credited with effectively inventing the one-time pad.

Another fun tidbit : apparently mother’s maiden name was used for security purposes way back in 1882!

I really like shiso leaves and their cousins. I had a shiso plant but it did not survive the California sun / I have a black thumb. One of my favorite meals at ISIT 2009 was with Bobak Nazer, where we found an out-of-the way BBQ joint where they brought us a long box filled with 7 varieties of leaves, including perilla leaves. It makes me hungry just writing about it.

Kudos to Adrienne for the amazing photo.

There’s Only One Sun, a short sci-fi film by Wong Kar-Wai.

Quote of the day : squabbles

I am writing a paper at the moment on some of my work with Steve Checkoway and Hovav Shacham on voting, which has involved a pretty broad literature search in social choice theory. I came across this quote about approval voting (AV) as an alternative to plurality voting (PV) in the paper Going from theory to practice: the mixed success of approval voting by Steven J. Brams and Peter C. Fishburn (Soc Choice Welfare 25:457–474 (2005)):

The confrontation between theory and practice offers some interesting lessons on “selling” new ideas. The rhetoric of AV supporters has been opposed not only by those supporting extant systems like plurality voting (PV)—including incumbents elected under PV—but also by those with competing ideas, particularly proponents of other voting systems like the Borda count and the Hare system of single transferable vote.

We conclude that academics probably are not the best sales people for two reasons: (1) they lack the skills and resources, including time, to market their ideas, even when they are practicable; and (2) they squabble among themselves. Because few if any ideas in the social sciences are certifiably “right” under all circumstances, squabbles may well be grounded in serious intellectual differences. Sometimes, however, they are not.

I don’t think it’s particular to the social sciences…

On another note, the IEEE adopted AV at some point but then abandoned it. According to a report on the (very partisan) range voting website, there are shady reasons.

Readings

The Gangster We Are All Looking For (lê thi diem thúy) — This is a fragmented and short narrative of a young Vietnamese immigrant to the US and her time growing up in various neighborhoods in San Diego. It’s the KPBS One Book, One San Diego selection so there were 25 copies at the library. The little vignettes are fleeting but touching, but in a sense you don’t feel that the narrator is particularly introspective, at least not in a direct way. However, I think it was definitely worth reading, if for no other reason than to hear her unique perspective.

The Terrible Privacy of Maxwell Sim (Jonathan Coe) — A satirical novel which came recommended but which in the end I felt cheated by. Maxwell Sim embarks on a new job after a recent divorce and few months off for depression, and ends up learning new things about himself and his family. He’s a bit of a loser, to be honest, but in the end you kind of feel for him as he muddles through emails, old letters, facebook, and the like. What is a big cheat is the ending, in which the author (!) appears. Blech.

Symmetry and Its Discontents (Sheridan Zabell) — A lovely collection of essays on the philosophy, history, and mathematics of symmetry assumptions in problems of induction. The last two chapters are especially good as they discuss a bit of the history and background of such things as Good-Turing estimators and exchangeable partition processes. I learned about this book a while ago from Susan Holmes at the AIM Workshop on estimating probability distributions.

Electronic Elections (R. Michael Alvarez and Thad E. Hall) — A short but dense book that makes the case for a “risk management” approach to assessing the value of electronic voting machines. Electronic voting machines have all sorts of benefits, including better accessibility for the disabled, no “hanging chads,” and so on. But they are also woefully unsecure and hackable, as has been demonstrated time and again by computer security folks. Alvarez and Hall feel like the CS folks are being unfair and think (in a very nebulous way) that the benefits outweigh the risks. I found the data about voter confusion and error rates, etc. interesting, but I think the authors completely miss the point of the security community’s critique of electronic voting systems. Designing a shoddy electronic voting system is bad, regardless of the purported benefits.