More ArXiV skims

Assumptionless consistency of the Lasso
Sourav Chatterjee
The title says it all. Given p-dimensional data points \{ \mathbf{x}_i : i \in [n] \} the Lasso tries to fit the model \mathbb{E}( y_i | \mathbf{x_i}) = \boldsymbol{\beta} \mathbf{x}_i by minimizing the \ell^1 penalized squared error
\sum_{i=1}^{n} (y_i - \boldsymbol{\beta} \mathbf{x}_i)^2 + \lambda \| \boldsymbol{\beta} \|_1.
The paper analyzes the Lasso in the setting where the data are random, so there are n i.i.d. copies of a pair of random variables (\mathbf{X},Y) so the data is \{(\mathbf{X}_i, Y_i) : i \in [n] \}. The assumptions are on the random variables (\mathbf{X},Y) : (1) each coordinate |X_i| \le M is bounded, the variable Y = (\boldsymbol{\beta}^*)^T \mathbf{X} + \varepsilon, and \varepsilon \sim \mathcal{N}(0,\sigma^2), where \boldsymbol{\beta}^* and \sigma are unknown constants. Basically that’s all that’s needed — given a bound on \|\boldsymbol{\beta}\|_1, he derives a bound on the mean-squared prediction error.

On Learnability, Complexity and Stability
Silvia Villa, Lorenzo Rosasco, Tomaso Poggio
This is a handy survey on the three topics in the title. It’s only 10 pages long, so it’s a nice fast read.

Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression
Francis Bach
A central challenge in stochastic optimization is understanding when the convergence rate of the excess loss, which is usually O(1/\sqrt{n}), can be improved to O(1/n). Most often this involves additional assumptions on the loss functions (which can sometimes get a bit baroque and hard to check). This paper considers constant step-size algorithms but where instead they consider the averaged iterate $\latex \bar{\theta}_n = \sum_{k=0}^{n-1} \theta_k$. I’m trying to slot this in with other things I know about stochastic optimization still, but it’s definitely worth a skim if you’re interested in the topic.

On Differentially Private Filtering for Event Streams
Jerome Le Ny
Jerome Le Ny has been putting differential privacy into signal processing and control contexts for the past year, and this is another paper in that line of work. This is important because we’re still trying to understand how time-series data can be handled in the differential privacy setting. This paper looks at “event streams” which are discrete-valued continuous-time signals (think of count processes), and the problem is to design a differentially private filtering system for such signals.

Gossips and Prejudices: Ergodic Randomized Dynamics in Social Networks
Paolo Frasca, Chiara Ravazzi, Roberto Tempo, Hideaki Ishii
This appears to be a gossip version of Acemoglu et al.’s work on “stubborn” agents in the consensus setting. They show similar qualitative behavior — opinions fluctuate but their average over time converges (the process is ergodic). This version of the paper has more of a tutorial feel to it, so the results are a bit easier to parse.