I figured I would blog about this week’s workshop at Banff in a more timely fashion. Due to the scheduling of flights out of Calgary, I will have to miss the last day of talks. The topics of people’s presentations varied rather widely, and many were not about the sort of Good-Turing estimator setup. Sometimes it was a bit hard to see how to see how the problems or approaches were related (not that they had to be directly), but given that the crowd had widely varying backgrounds, presenters had a hard time because the audience had to check in a new set of notation or approach for every talk. The advantage is that there were lots of questions — the disadvantage is that people insisted on “finishing” their presentations. By mid-week my brain was over-full, and a Wednesday afternoon hike up Sulphur Mountain was the perfect solution.
Tag Archives: statistics
An observation
Bayesian nonparametrics is a bit like the Catholic church : there is a fair bit of dogma, mystery, and reliance on countably infinite populations from the developing world.
Banfffffffffffffff
I’ve just arrived in chilly but beautiful Banff for a workshop on Information theory and statistics for large alphabets. I’m looking forward to it, although I will have to miss the last day due to the timing of flights out of Calgary that get me to Chicago before midnight. My itineraries there and back seem especially perverse : ORD-SEA-YYC and YYC-SFO-ORD. However, thanks to the new gig I have a new laptop with a functional battery so I am doing a bit more busy-work and less New Yorker reading in the plane. I might try to write a bit more about the topics in the workshop — although the topic seems focused, there are a wide range of approaches and angles to take on the problem of estimating probabilities/prevalences in situations where you may not get to see each outcome once. Certainly I hope I can get the journal version of a paper from last year’s Allerton squared away.
News from UChicago Statistics
In a further procrastination about Allerton blogging, I want to share two items from the UChicago Statistics department.
The first is that there will be a conference in early December in memory of Partha Niyogi, who passed away last year. Registration is free, so if you are in the area, you may want to come.
The other is that Patrick Billingsley passed away in April. I had no idea that he “also became an accomplished actor of stage and screen” (ideas are now percolating in my head). He was in the The Untouchables! The next time I am over there, I will take a picture of a poster they have up advertising his tenor voice performing a “tragicomic rendition” of ergodic theory and coding. I gotta find a new shtick for myself, I think.
Linkage
I will post more about Allerton soon (I’m still on the road), but I wanted to clear out some old links before doing that. I’m starting my new gig at TTIC this week, and the last few weeks have been a whirlwind of travel and internetlessness, so blogging has been curtailed.
- The 12 coolest libraries in the world (via MeFi).
- Todd Coleman shows you how to peel garlic efficiently via “shaking the dickens out of it.” No, not that Todd Coleman!
- Some scraps from an exterminated language have been found.
- Florence Nightingale’s statistical diagrams (via MeFi), but also covered by the BBC (part 1, part 2).
And a (not-so-recent) tour around the ArXiV — I haven’t had a chance to read these yet, but maybe once I am settled…
- Active Ranking using Pairwise Comparisons by Kevin G. Jamieson and Robert D. Nowak — this is related to a talk given by Constantine Caramanis at Allerton. Instead of looking at how to learn from total orderings, we have to learn the total ordering from pairwise ordererings (I like chocolate more than vanilla).
- Distributed Algorithms for Consensus and Coordination in the Presence of Packet-Dropping Communication Links – Part I and Part II by Nitin H. Vaidya, Christoforos N. Hadjicostis, and Alejandro D. Dominguez-Garcia (in different orders). This paper looks at consensus in asymmetric communication settings with packet drops and modify the update rule to achieve almost sure convergence. The analysis seems to rely on the “coefficient of ergodicity” approach for inhomogeneous Markov chains. It’s doubly appropriate for the blog!
- Distributed Algorithms for Optimal Power Flow Problem by Albert Y.S. Lam, Baosen Zhang, and David Tse. Power networks are hot and this paper studies an interesting problem of cost minimization in power flow networks. I found it a bit weird that the abstract and introduction assume you already know what the problem is… but that’s what happens when you are an outsider.
- Optimal Sensor Placement for Intruder Detection by Waseem A. Malik, Nuno C. Martins, and Ananthram Swami
- The Projection Method for Reaching Consensus and the Regularized Power Limit of a Stochastic Matrix by R. P. Agaev, P. Yu. Chebotarev
- Tropical Algebraic approach to Consensus over Networks, by Joel George Manathara, Ambedkar Dukkipati, Dabasish Ghose
- Fundamentals of Stein’s method by Nathan Ross
- A Learning Theory Approach to Non-Interactive Database Privacy by Avrim Blum, Katrina Ligett, Aaron Roth
- Bandits with an Edge by Dotan Di Castro, Claudio Gentile, Shie Mannor
- State-of-the-Art in Sequential Change-Point Detection by Aleksey S. Polunchenko, Alexander G. Tartakovsky
- Wasserstein distances for discrete measures and convergence in nonparametric mixture models by XuanLong Nguyen
- High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity by Po-Ling Loh, Martin J. Wainwright
- Canonical Estimation in a Rare-Events Regime by Mesrob I. Ohannessian, Vincent Y. F. Tan, Munther A. Dahleh
Predicting insurgent attacks with power laws, oh my
A while back I heard this NPR story about a method for predicting attacks that was published in Science. The news story left me a bit perplexed — where was the prediction? It turns out to be yet another version of attack of the power laws. After a bit of curve fitting, the meat of the thing seems to be this:
Our broad-brush theory does not require knowledge of specific adaptation or counteradaptation mechanisms, and hence bypasses issues such as changes in insurgent membership, technology, learning, or skill set, as well as a need to know the hearts and minds of local residents. We regard the escalation of hostilities as representing adaptation and counteradaptation in a way that is analogous to the Red Queen hypothesis of evolutionary biology. (Johnson et al. 2011)
Hearts and minds : pish-posh, what nonsense. Bring on the Red Queen!
This continual arms race of adaptation and counter-adaptation suggests similarities with Darwinian selection in nature. As noted by Rafe Sagarin, “a fundamental tenet of evolutionary biology is that organisms must constantly adapt just to stay in the same strategic position relative to their enemies—who are constantly changing as well. For example, to protect its DNA against viruses, a host organism must continually change the access code to its genetic material” (Sagarin, 2003, p. 69). Meeting the Red Queen in Alice in Wonderland, Alice finds that however fast she runs, she always stays in the same place. The “Red Queen” concept has become widely used in evolutionary biology to describe how competing individuals can become locked in an “arms race” of strategies, machinery, or weapons. However impressive one side becomes, it may never come any closer to defeating its opponent. (Johnson 2009)
Unfortunately, Red-Queen like equilibrium is not possible. Since they’ve already thrown out the idea of modeling, they’re left with fitting parameters in a stochastic model:
However, instantaneous and perfect counteradaptation is unrealistic; indeed, complex adaptation-counteradaptation dynamics generated by sporadic changes in circumstances imply that R’s temporal evolution is likely to be so complex that it can be modeled as a stochastic process. We do not need to know exactly why changes at any specific moment, nor do the changes in have to have the same value, because each change is the net result of a mix of factors [such as learning by experience or changes in personnel and technology] for each opponent. (Johnson et al. 2011)
They they go on to say that once you fit the parameters you could estimate momentum from a few fatal days to estimate the timing of the next fatal day. But they don’t actually do this (say, by holding out some data and doing a little validation), except an example of a single prediction being more accurate than one might expect.
So in the end, what do we have? This is a paper about modeling and not about prediction, but prediction sounds a lot more sexy, and so NPR ran with that aspect of it. In order to build a predictor, you would need to build a good model, which presumably means sitting around and waiting for fatal attacks to happen because you can’t make an omelette predictive model without breaking some eggs killing some people. Finally, “ZOMG P0W3R L4WZ” is so the aughts.
EVT / WOTE 2011
This week I attended EVT/WOTE ’11, a workshop on voting, technology, and trustworthiness co-located with the USENIX Security conference. I phased in and out of the workshop, which had a number of different session themes:
- “E2E”, or end-to-end voting
- empirical studies of real elections direct-recording electronic voting machines (DREs), forensics for them, and their impact on the reliability of election outcomes
- studies of accessibility issues, either to polling places or for different voting technologies
- new proposals for voting systems
- auditing and metrics for existing election systems
I pretty much work on the last one, so while some of the panels were quite interesting, some technical talks were a little beyond me. Dana Debeauvoir, the Travis County Clerk (where Austin is) gave a keynote about how she thinks technologists and elections officials can work together, and rather bravely put forward a proposal for an electronic voting system to be used at the county level. There were lots of comments about that, of course.
Theron Ji gave a nice talk about how write-in marks are (or are not) properly counted by optical scan machines. People often forget to fill in the bubble saying they are doing a write-in, which can have pretty disastrous effects, as San Diego politician Donna Frye found out. Gillian Piner reported on a survey she did of vision-impaired voters, asking them what they want for accessibility technologies. Imagine that, asking them what they want!
The two talks of most interest to me were by David Cary and Tom Magrino, both on the margin of victory for IRV elections. Cary presented a method for estimating the margin based only on the tabulation of first-choices in each round, whereas Magrino presented an exact calculation that involved solving many integer linear programs. The scaling is not so great with the number of candidates (exponential), but for the kind of IRV elections we see in the US it was definitely doable. Margin calculations are important for developing auditing algorithms (on which I will write more once my paper is done). Philip Stark gave a plenary lecture on auditing which I missed part of due to a conflict with a parallel workshop.
There were also some interesting panels. The most contentious one was on internet voting, which I missed much of but the discussion went over by an hour so I think got the gist of it. Some people are afraid of voting over the internet, but the crypto people think it can be made safe. The panel on the the Sarasota House race in 2006 tried to hone in on the reason for the problems with undervotes in that contest. A lot can be explained by the design of the ballot, proving again that user interface and graphic design is really important!
The rump session was, as always, a mixture of amusing and technical and dry. The real highlight was probably David Bismark, who seems to have antagonized someone who has a new voting system involving moon projections. Wow.
California Elections Code on auditing
From Section 15360:
(a) During the official canvass of every election in which a
voting system is used, the official conducting the election shall
conduct a public manual tally of the ballots tabulated by those
devices, including vote by mail voters’ ballots, cast in 1 percent of
the precincts chosen at random by the elections official. If 1
percent of the precincts is less than one whole precinct, the tally
shall be conducted in one precinct chosen at random by the elections
official.In addition to the 1 percent manual tally, the elections official
shall, for each race not included in the initial group of precincts,
count one additional precinct. The manual tally shall apply only to
the race not previously counted.Additional precincts for the manual tally may be selected at the
discretion of the elections official.
Clearly this is not written by a statistician. Counting 1% of precincts “chosen at random” is hardly clear, and also doesn’t tell you too much about how many ballots you are going to count.
Linkage
Kenji explains the science of no-knead bread. I have to try making some this summer.
A nice post about Punjabi Mexicans in California, a little-known bit of South Asian immigration history in the US.
Detection of correlations is a snazzy title, but they mean something very particular by it. Also, why does everyone hate on the GLRT so much?
I want to see The Trip, especially after watching this clip. (h/t Adam G.)
P. Sainath on civil society in India. (h/t Soumya C.)
Linkage
I’m blogging from Chicago, where it is a balmy 42 degrees but sunny. Whither spring, I ask! Actually, I’m not blogging so much as linking to a bunch of stuff.
For San Diegans, the SD Asian Film Festival Spring Showcase is going on. It looks like I’ll miss a lot of it but I might try to catch something at the end of the week.
Less Pretentious & More Accurate Titles For Literary Masterworks — funny but possibly NSFW.
This home-scanning program seems creepy, regardless of the constitutionality issues.
Unfortunate headlines strike again.
I really like scallion pancakes. I’ll have to try this out when I get back to San Diego.
I agree that this video is awesome. Yo-Yo Ma and Lil Buck. I think that dude is made of rubber. And steel.
Tom Waits was induced into the Rock and Roll Hall of Fame. I just hope I get to see him live some day.
Some things to skim or read from ArXiV when I get the chance:
Sequential Analysis in High Dimensional Multiple Testing and Sparse Recovery (Matt Malloy, Robert Nowak)
Differential Privacy: on the trade-off between Utility and Information Leakage (Mário S. Alvim, Miguel E. Andrés, Konstantinos Chatzikokolakis, Pierpaolo Degano, Catuscia Palamidessi)
Capacity of Byzantine Consensus with Capacity-Limited Point-to-Point Links (Guanfeng Liang, Nitin Vaidya)
Settling the feasibility of interference alignment for the MIMO interference channel: the symmetric square case (Guy Bresler, Dustin Cartwright, David Tse)
Decentralized Online Learning Algorithms for Opportunistic Spectrum Access (Yi Gai, Bhaskar Krishnamachari)
Online and Batch Learning Algorithms for Data with Missing Features (Afshin Rostamizadeh, Alekh Agarwal, Peter Bartlett)
Nonuniform Coverage Control on the Line (Naomi Ehrich Leonard, Alex Olshevsky)
Degree Fluctuations and the Convergence Time of Consensus Algorithms (Alex Olshevsky, John Tsitsiklis)
