I am traveling all over India at the moment so I’m not really able to write contentful posts. Here are even more links instead, sigh. Maybe later I’ll talk about log-Sobolev inequalities so I can be cool like Max.

Speaking of Max, he posted this hilarious bad lip reading version of Game of Thrones. Probably NSFW. I don’t even like the series but it’s pretty funny.

For those who are fans of Rejected, Don Hertzfeldt’s new film is available on Vimeo.

Those who were at Berkeley may remember seeing Ed Reed perform at the Cheeseboard. His album (which I helped fund via indiegogo, was named a Downbeat Editors’ Pick. It’s a great album.

In light of the Snowden leaks, some doubt has been cast on NIST’s crypto standards.

I’m super late to this, but I endorse Andrew’s endorsement of Sergio‘s interview with Robert Fano in the IT Newsletter. Here’s just the article, if you want that.

David McAllester, my department chair at TTI, has a started a new blog.

I thought it was pretty well known that people are fairly unique by ZIP code, but Forbes has an article about it now (h/t Raj). Of course, stores can also ping a smartphone’s WiFi to get more accurate location information about your activity within the store — when you check out they can tag your the MAC address of your device to all the other information about you. Creeptastic!

Bradley Efron’s perspective on the impact of Bayes’ Theorem from Science (h/t Kevin).

Some discussion on what makes a popular philosophy book. I wonder what, if anything, transfers over to a popular mathematical book?

Some thoughts from Larry Laudan on the mathematization of the presumption of innocence.

I’m on the program committee for the Cyber-Security and Privacy symposium, so I figured I would post this here to make more work for myself.

GlobalSIP 2013 – Call for Papers
IEEE Global Conference on Signal and Information Processing
December 3-5, 2013 | Austin, Texas, U.S.A.

GlobalSIP: IEEE Global Conference on Signal and Information Processing is a new flagship IEEE Signal Processing Society conference. The focus of this conference is on signal and information processing and up-and-coming signal processing themes.

GlobalSIP is composed of symposia selected based on responses to the call-for-symposia proposals. GlobalSIP is composed of symposia on hot topics related to signal and information processing.

The selected symposia are:

Paper submission will be online only through the GlobalSIP 2013 website Papers should be in IEEE two-column format. The maximum length varies among the symposia; be sure to check each symposium’s information page for details. Authors of Signal Processing Letters papers will be given the opportunity to present their work at GlobalSIP 2013, subject to space availability and approval by the Technical Program Chairs of GlobalSIP 2013. The authors need to specify in which symposium they wish to present their paper. Please check conference webpage for details.

Important Dates:
*New* Paper Submission Deadline – June 15, 2013
Review Results Announce – July 30, 2013
Camera-Ready Papers Due – September 7, 2013
*New* SPL request for presentation – September 7, 2013

Learning from transcriptomes can be cheaper for organisms which have never been sequenced.

A fancy Nature article on mobility privacy, in case you weren’t convinced by other studies on mobility privacy.

Bad statistics in neuroscience. Color me unsurprised.

I bet faked results happen a lot in pharmaceutical trials, given the money involved. Perhaps we should jail people for faking data as a disincentive?

The Atheist shoe company did a study to see if the USPS was discriminating against them.

Endless Things [John Crowley] — Book four of the Aegypt Cycle, and the one most grounded in the present. The book moves more swiftly than the others, as if Crowley was racing to the end. Many of the concerns of the previous books, such as magic, history, and memory, are muted as the protagonist Pierce Moffett wends his way through his emotional an intellectual turmoil and into what in the end amounts to a kind of peace. Obviously only worth reading if you read the first three books.

Understanding Privacy [Daniel Solove] — A law professor’s take on what constitutes privacy. Solove wants to conceptualize privacy in terms of clusters of related ideas rather than take a single definition, and he tries to put a headier philosophical spin on it by invoking Wittgenstein. I found the book a bit overwritten but it does parse out the things we call privacy, especially in the longest chapter on the taxonomy of privacy. It’s not a very long book, but it has a number of good examples and also case law to show how muddled our legal definitions have become. He also makes a strong case for increased protections and shows how the law is blind to the effects of information aggregation, for example.

The Fall of the Stone City [Ismail Kadare] — An allegorical novel by a Man Booker prize winner chronicling the Nazi occupation and the communist takeover of Gjirokaster, an old Albanian city. It’s a dark absurdist comedy, partly in the vein of Kafka but with a bit of… Calvino almost. The tone of the book (probably a testament to the translator) has this almost academic detachment, gently mocking as it describes the ways in which the victors try to rewrite history.

Invisible Men [Becky Pettit] — A sobering look at how mass incarceration interacts with official statistics. Because most surveys are household-based, they do not count the increasingly larger incarcerated population, thereby introducing a systematic racialized bias in the statistics used for public policy. In particular, Pettit shows how this bias leads to underestimation of racial inequity because the (mainly young black male) prisoners are “erased” in the official records.

The Rise of Ransom City [Felix Gilman] — A sequel to The Half-Made World, and a wondrously engrossing read it is too, filled with the clash of ideas, the corruption of corporations, the “borrowing” and evolution of ideas, and the ravages of industrialization. Also has a healthy dose of Mark Twain for good measure.

This link is worth its own post. Please check out ProPublica’s article on data brokers — it’s very relevant to how much is already known (and sold) about you.

Maybe more like “paper whenever I feel like it.” This is a post on a now not-so-recent ArXiV preprint by Quan Geng and Pramod Viswanath on constructing the best mechanism (output distribution) for guaranteeing differential privacy under a utility constraint.


I just wanted to write a few words about the workshop at the Bellairs Research Institute. I just returned from sunny Barbados to frigid Chicago, so writing this will help me remember the sunshine and sand:

The beach at Bathsheba on the east coast of Barbados

The beach at Bathsheba on the east coast of Barbados

Mike Rabbat put on a great program this year, and there were lots of talks on a range of topics in machine learning, signal processing, and optimization. The format of the workshop was to have talks with lots of room for questions and discussion. Talks were given out on the balcony where we were staying, and we had to end at about 2:30 because the sunshine would creep into our conference area, baking those of us sitting too far west.


Almost a month later, I’m finishing up blogging about NIPS. Merry Christmas and all that (is anyone reading this thing?), and here’s to a productive 2013, research-wise. It’s a bit harder to blog these things because unlike a talk, it’s hard to take notes during a poster presentation.

Overall, I found NIPS to be a bit overwhelming — the single-track format makes it feel somehow more crowded than ISIT, but also it was hard for me to figure out how to strike the right balance of going to talks/posters and spending time talking to people and getting to know what they are working on. Now that I am fairly separated from my collaborators, conferences should be a good time to sit down and work on some problems, but somehow things are always a bit more frantic than I want them to be.

Anyway, from the rest of the conference, here are a few talks/posters that I went to and remembered something about.

T. Dietterich
Challenges for Machine Learning in Computational Sustainability
This was a plenary talk on machine learning problems that arise in natural resources management. There was a lot in this talk, and a lot of different problems ranging from prediction (for bird migrations, etc), imputation of missing data, and classification. These were real-world hands-on problems and one thing I got out of it is how much work you need to put into the making algorithms that work for the dat you have, rather than pulling some off-the-shelf works-great-in-theory method. He gave a version of this talk at TTI but I think the new version is better.

K. Muandet, K. Fukumizu, F. Dinuzzo, B. Schölkopf
Learning from Distributions via Support Measure Machines
This was on generalizing SVMs to take distributions as inputs instead of points — instead of getting individual points as training data, you get distributions (perhaps like clusters) and you have to do learning/classification on that kind of data. Part of the trick here is finding the right mathematical framework that remains computationally tractable.

J. Duchi, M. Jordan, M. Wainwright
Privacy Aware Learning
Since I work on privacy, this was of course interesting to me — John told me a bit about the work at Allerton. The model of privacy is different than the “standard” differential privacy model — data is stochastic and the algorithm itself (the learner) is not trusted, so noise has to be added to individual data points. A bird’s eye view of the idea is this : (1) stochastic gradient descent (SGD) is good for learning, and is robust to noise (e.g. noisy gradients), (2) noise is good at protecting privacy, so (3) SGD can be used to guarantee privacy by using noisy gradients. Privacy is measured here in terms of the mutual information between the data point and a noisy gradient using that data point. The result is a slowdown in the convergence rate that is a function of the mutual information bound, and it appears in the same place in the upper and lower bounds.

J. Wiens, J. Guttag, E. Horvitz
Patient Risk Stratification for Hospital-Associated C. Diff as a Time-Series Classification Task
This was a cool paper on predicting which patients would be infected with C. Diff (a common disease people get as a secondary infection from being the hospital). Since we have different data for each patient and lots of missing data, the classification problem is not easy — they try to assess a time-evolving risk of infection and then predict whether or not the patient will test positive for C. Diff.

P. Loh, M. Wainwright
No voodoo here! Learning discrete graphical models via inverse covariance estimation
This paper won a best paper award. The idea is that for Gaussian graphical models the inverse covariance matrix is graph-compatible — zeros correspond to missing edges. However, this is not true/easy to do for discrete graphical models. So instead they build the covariance matrix for all tuples of variables — \{X_1, X_2, X_3, X_4, X_1 X_2, X_1 X_3, \ldots \} (really what they want is a triangulation of the graph) and then show that indeed, the inverse covariance matrix does respect the graph structure in a sense. More carefully, they have to augment the variables with the power set of the maximal cliques in a triangulation of the original graphical model. The title refers to so-called “paranormal” methods which are also used for discrete graphical models.

V. Kanade, Z. Liu, B. Radunovic
Distributed Non-Stochastic Experts
This was a star-network with a centralized learner and a bunch of experts, except that the expert advice arrives at arbitrary times — there’s a tradeoff between how often the experts communicate with the learner and the achievable regret, and they try to quantify this tradeoff.

M. Streeter, B. McMahan
No-Regret Algorithms for Unconstrained Online Convex Optimization
There’s a problem with online convex optimization when the feasible set is unbounded. In particular, we would want to know that the optimal x^{\ast} is bounded so that we could calculate the rate of convergence. They look at methods which can get around this by proposing an algorithm called “reward doubling” which tries to maximize reward instead of minimize regret.

Y. Chen, S. Sanghavi, H. Xu
Clustering Sparse Graphs
Suppose you have a graph and want to partition it into clusters with high intra-cluster edge density and low inter-cluster density. They come up with nuclear-norm plus L_1 objective function to find the clusters. It seems to work pretty well, and they can analyze it in the planted partition / stochastic blockmodel setting.

P. Shenoy, A. Yu
Strategic Impatience in Go/NoGo versus Forced-Choice Decision-Making
This was a talk on cognitive science experimental design. They explain the difference between these two tasks in terms of a cost-asymmetry and use some decision analysis to explain a bias in the Go/NoGo task in terms of Bayes-risk minimization. The upshot is that the different in these two tasks may not represent a difference in cognitive processing, but in the cost structure used by the brain to make decisions. It’s kind of like changing the rules of the game, I suppose.

S. Kpotufe, A. Boularias
Gradient Weights help Nonparametric Regressors
This was a super-cute paper, which basically says that if the regressor is very sensitive in some coordinates and not so much in others, you can use information about the gradient/derivative of the regressor to rebalance things and come up with a much better estimator.

K. Jamieson, R. Nowak, B. Recht
Query Complexity of Derivative-Free Optimization
Sometimes taking derivatives is expensive or hard, but you can approximate them by taking two close points and computing an approximation. This requires the function evaluations to be good. Here they look at how to handle approximate gradients computed with noisy function evaluations and find the convergence rate for those procedures.

Here’s a roundup of some interesting posts/pages on technical things.

Over at Larry Wasserman’s blog, Rob Tibshirani suggests 9 Great Statistics papers published after 1970. You know, in case you were looking for some light reading over winter break.

Videos from the DIMACS Differential Privacy Workshop are up.

All of these ads for jobs this year want someone who works on Big Data. But… do you really have big data? Or, as I like to ask, “how big is big, anyway?”

Speaking of big data, this talk by Peter Bartlett looks cool. (h/t Andrew Gelman)

Max Raginsky and Igal Sason have a tutorial on measure concentration. Log Sobolev inequalities are a dish best served cold.

I’ll probably do an ArXiV roundup sometime soon — trying to catch up on a backlog of reading and thinking lately.


Get every new post delivered to your Inbox.

Join 159 other followers