# LabTV Profiles Are Up!

And now, a little pre-ITA self-promotion. As I wrote earlier, LabTV interviewed me and a subset of the students in the lab last semester (it was opt-in). This opportunity came out of my small part in the a large-scale collaboration organized by Mind Research Network (PI: Vince Calhoun) on trying to implement distributed and differentially private algorithms in a system to enable collaborative neuroscience research. Our lab profiles are now up! They interviewed me, graduate students Hafiz Imtiaz, Sijie Xiong, and Liyang Xie, and an undergraduate student, Kevin Sun. In watching I found that I learned a few new things about my students…

# Call for Papers: T-SIPN Special Issue on Inference and Learning Over Networks

IEEE Signal Processing Society
IEEE Transactions on Signal and Information Processing over Networks
Special Issue on Inference and Learning Over Networks

Networks are everywhere. They surround us at different levels and scales, whether we are dealing with communications networks, power grids, biological colonies, social networks, sensor networks, or distributed Big Data depositories. Therefore, it is not hard to appreciate the ongoing and steady progression of network science, a prolific research field spreading across many theoretical as well as applicative domains. Regardless of the particular context, the very essence of a network resides in the interaction among its individual constituents, and Nature itself offers beautiful paradigms thereof. Many biological networks and animal groups owe their sophistication to fairly structured patterns of cooperation, which are vital to their successful operation. While each individual agent is not capable of sophisticated behavior on its own, the combined interplay among simpler units and the distributed processing of dispersed pieces of information, enable the agents to solve complex tasks and enhance dramatically their performance. Self-organization, cooperation and adaptation emerge as the essential, combined attributes of a network tasked with distributed information processing, optimization, and inference. Such a network is conveniently described as an ensemble of spatially dispersed (possibly moving) agents, linked together through a (possibly time – varying) connection topology. The agents are allowed to interact locally and to perform in-network processing, in order to accomplish the assigned inferential task. Correspondingly, several problems such as, e.g., network intrusion, community detection, and disease outbreak inference, can be conveniently described by signals on graphs, where the graph typically accounts for the topology of the underlying space and we obtain multivariate observations associated with nodes/edges of the graph. The goal in these problems is to identify/infer/learn patterns of interest, including anomalies, outliers, and existence of latent communities. Unveiling the fundamental principles that govern distributed inference and learning over networks has been the common scope across a variety of disciplines, such as signal processing, machine learning, optimization, control, statistics, physics, economics, biology, computer, and social sciences. In the realm of signal processing, many new challenges have emerged, which stimulate research efforts toward delivering the theories and algorithms necessary to (a) designing networks with sophisticated inferential and learning abilities; (b) promoting truly distributed implementations, endowed with real-time adaptation abilities, needed to face the dynamical scenarios wherein real-world networks operate; and (c) discovering and disclosing significant relationships possibly hidden in the data collected from across networked systems and entities. This call for papers therefore encourages submissions from a broad range of experts that study such fundamental questions, including but not limited to:

• Adaptation and learning over networks.
• Consensus strategies; diffusion strategies.
• Distributed detection, estimation and filtering over networks.
• Distributed dictionary learning.
• Distributed game-theoretic learning.
• Distributed machine learning; online learning.
• Distributed optimization; stochastic approximation.
• Distributed proximal techniques, sub-gradient techniques.
• Learning over graphs; network tomography.
• Multi-agent coordination and processing over networks.
• Signal processing for biological, economic, and social networks.
• Signal processing over graphs.

Prospective authors should visit http://www.signalprocessingsociety.org/publications/periodicals/tsipn/ for information on paper submission. Manuscripts should be submitted via Manuscript Central at http://mc.manuscriptcentral.com/tsipn-ieee.

Important Dates:

• Manuscript submission: February 1, 2016
• First review completed: April 1, 2016
• Revised manuscript due: May 15, 2016
• Second review completed: July 15, 2016
• Final manuscript due: September 15, 2016
• Publication: December 1, 2016

Guest Editors:

# ISIT 2015 : statistics and learning

The advantage of flying to Hong Kong from the US is that the jet lag was such that I was actually more or less awake in the mornings. I didn’t take such great notes during the plenaries, but they were rather enjoyable, and I hope that the video will be uploaded to the ITSOC website soon.

There were several talks on entropy estimation in various settings that I did not take great notes on, to wit:

• OPTIMAL ENTROPY ESTIMATION ON LARGE ALPHABETS VIA BEST POLYNOMIAL APPROXIMATION (Yihong Wu, Pengkun Yang, University Of Illinois, United States)
• DOES DIRICHLET PRIOR SMOOTHING SOLVE THE SHANNON ENTROPY ESTIMATION PROBLEM? (Yanjun Han, Tsinghua University, China; Jiantao Jiao, Tsachy Weissman, Stanford University, United States)
• ADAPTIVE ESTIMATION OF SHANNON ENTROPY (Yanjun Han, Tsinghua University, China; Jiantao Jiao, Tsachy Weissman, Stanford University, United States)

I would highly recommend taking a look for those who are interested in this problem. In particular, it looks like we’re getting towards more efficient entropy estimators in difficult settings (online, large alphabet), which is pretty exciting.

QUICKEST LINEAR SEARCH OVER CORRELATED SEQUENCES
Javad Heydari, Ali Tajer, Rensselaer Polytechnic Institute, United States
This talk was about hypothesis testing where the observer can control the samples being taken by traversing a graph. We have an $n$-node graph (c.f. a graphical model) representing the joint distribution on $n$ variables. The data generated is i.i.d. across time according to either $F_0$ or $F_1$. At each time you get to observe the data from only one node of the graph. You can either observe the same node as before, explore by observing a different node, or make a decision about whether the data from from $F_0$ or $F_1$. By adopting some costs for different actions you can form a dynamic programming solution for the search strategy but it’s pretty heavy computationally. It turns out the optimal rule for switching has a two-threshold structure and can be quite a bit different than independent observations when the correlations are structured appropriately.

MISMATCHED ESTIMATION IN LARGE LINEAR SYSTEMS
Yanting Ma, Dror Baron, North Carolina State University, United States; Ahmad Beirami, Duke University, United States
The mismatch studied in this paper is a mismatch in the prior distribution for a sparse observation problem $y = Ax + \sigma_z z$, where $x \sim P$ (say a Bernoulli-Gaussian prior). The question is what happens when we do estimation assuming a different prior $Q$. The main result of the paper is an analysis of the excess MSE using a decoupling principle. Since I don’t really know anything about the replica method (except the name “replica method”), I had a little bit of a hard time following the talk as a non-expert, but thankfully there were a number of pictures and examples to help me follow along.

SEARCHING FOR MULTIPLE TARGETS WITH MEASUREMENT DEPENDENT NOISE
Yonatan Kaspi, University of California, San Diego, United States; Ofer Shayevitz, Tel-Aviv University, Israel; Tara Javidi, University of California, San Diego, United States
This was another search paper, but this time we have, say, $K$ targets $W_1, W_2, \ldots, W_K$ uniformly distributed in the unit interval, and what we can do is query at each time $n$ a set $S_n \subseteq [0,1]$ and get a response $Y_n = X_n \oplus Z_n$ where $X_n = \mathbf{1}( \exists W_k \in S_n )$ and $Z_n \sim \mathrm{Bern}( \mu(S_n) + b )$ where $\mu$ is the Lebesgue measure. So basically you can query a set and you get a noisy indicator of whether you hit any targets, where the noise depends on the size of the set you query. At some point $\tau$ you stop and guess the target locations. You are $(\epsilon,\delta)$ successful if the probability that you are within $\delta$ of each target is less than $\epsilon$. The targeting rate is the limit of $\log(1/\delta) / \mathbb{E}[\tau]$ as $\epsilon,\delta \to 0$ (I’m being fast and loose here). Clearly there are some connections to group testing and communication with feedback, etc. They show there is a significant gap between the adaptive and nonadaptive rate here, so you can find more targets if you can adapt your queries on the fly. However, since rate is defined for a fixed number of targets, we could ask how the gap varies with $K$. They show it shrinks.

ON MODEL MISSPECIFICATION AND KL SEPARATION FOR GAUSSIAN GRAPHICAL MODELS
Varun Jog, University of California, Berkeley, United States; Po-Ling Loh, University of Pennsylvania, United States
The graphical model for jointly Gaussian variables has no edge between nodes $i$ and $j$ if the corresponding entry $(\Sigma^{-1})_{ij} = 0$ in the inverse covariance matrix. They show a relationship between the KL divergence of two distributions and their corresponding graphs. The divergence is lower bounded by a constant if they differ in a single edge — this indicates that estimating the edge structure is important when estimating the distribution.

CONVERSES FOR DISTRIBUTED ESTIMATION VIA STRONG DATA PROCESSING INEQUALITIES
Aolin Xu, Maxim Raginsky, University of Illinois at Urbana–Champaign, United States
Max gave a nice talk on the problem of minimizing an expected loss $\mathbb{E}[ \ell(W, \hat{W}) ]$ of a $d$-dimensional parameter $W$ which is observed noisily by separate encoders. Think of a CEO-style problem where there is a conditional distribution $P_{X|W}$ such that the observation at each node is a $d \times n$ matrix whose columns are i.i.d. and where the $j$-th row is i.i.d. according to $P_{X|W_j}$. Each sensor gets independent observations from the same model and can compress its observations to $b$ bits and sends it over independent channels to an estimator (so no MAC here). The main result is a lower bound on the expected loss as s function of the number of bits latex $b$, the mutual information between $W$ and the final estimate $\hat{W}$. The key is to use the strong data processing inequality to handle the mutual information — the constants that make up the ratio between the mutual informations is important. I’m sure Max will blog more about the result so I’ll leave a full explanation to him (see what I did there?)

More on Shannon theory etc. later!

# AISTATS 2015: a few talks from one day

I attended AISTATS for about a day and change this year — unfortunately due to teaching I missed the poster I had there but Shuang Song presented a work on learning from data sources of different quality, which her work with Kamalika Chaudhuri and myself. This was my first AISTATS. It had single track of oral presentations and then poster sessions for the remaining papers. The difficulty with a single track for me is that my interest in the topics is relatively focused, and the format of a general audience with a specialist subject matter meant that I couldn’t get as much out of the talks as I would have wanted. Regardless, I did get exposed to a number of new problems. Maybe the ideas can percolate for a while and inform something in the future.

Computational Complexity of Linear Large Margin Classification With Ramp Loss
Søren Frejstrup Maibing, Christian Igel
The main result of this paper (I think) is that ERM under ramp loss is NP-hard. They gave the details of the reduction but since I’m not a complexity theorist I got a bit lost in the weeds here.

A la Carte — Learning Fast Kernels
Zichao Yang, Andrew Wilson, Alex Smola, Le Song
Ideas like “random kitchen sinks” and other kernel approximation methods require you to have a kernel you want to approximate, but in many problems you in fact need to learn the kernel from the data. If I give you a kernel function $k(x,x') = k( |x - x'| )$, then you can take the Fourier transform $K(\omega)$ of $k$. This turns out to be a probability distribution, so you can sample random $\{\omega_i\}$ i.i.d. and build a randomized Fourier approximation of $k$. If you don’t know the kernel function, or you have to learn it, then you could instead try to learn/estimate the transform directly. This paper was about trying to do that in a reasonably efficient way.

Learning Where to Sample in Structured Prediction
Tianlin Shi, Jacob Steinhardt, Percy Liang
This was about doing Gibbs sampling, not for MCMC sampling from the stationary distribution, but for “stochastic search” or optimization problems. The intuition was that some coordinates are “easier” than others, so we might want to focus resampling on the harder coordinates. But this might lead to inaccurate sampling. The aim here twas to build a heterogenous sampler that is cheap to compute and still does the right thing.

Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning
Mario Lucic, Mesrob Ohannessian, Amin Karbasi, Andreas Krause
This paper won the best student paper award. They looked at a k-means problem where they do “data summarization” to make the problem a bit more efficient — that is, by learning over an approximation/summary of the features, they can find different tradeoffs between the running time, risk, and sample size for learning problems. The idea is to use coresets — I’d recommend reading the paper to get a better sense of what is going on. It’s on my summer reading list.

Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions
Alexandre Defossez, Francis Bach
What if you want to do SGD but you don’t want to sample the points uniformly? You’ll get a bias-variance tradeoff. This is another one of those “you have to read the paper” presentations. A nice result if you know the background literature, but if you are not a stochastic gradient aficionado, you might be totally lost.

Sparsistency of $\ell_1$-Regularized M-Estimators
Yen-Huan Li, Jonathan Scarlett, Pradeep Ravikumar, Volkan Cevher
In this paper they find a new condition, which they call local structured smoothness, which is sufficient for certain M-estimators to be “sparsistent” — that is, they recover the support pattern of a sparse parameter asymptotically as the number of data points goes to infinity. Examples include the LASSO, regression in general linear models, and graphical model selection.

Some of the other talks which were interesting but for which my notes were insufficient:

• Two-stage sampled learning theory on distributions (Zoltan Szabo, Arthur Gretton, Barnabas Poczos, Bharath Sriperumbudur)
• Generalized Linear Models for Aggregated Data (Avradeep Bhowmik, Joydeep Ghosh, Oluwasanmi Koyejo)
• Efficient Estimation of Mutual Information for Strongly Dependent Variables (Shuyang Gao, Greg Ver Steeg, Aram Galstyan)
• Sparse Submodular Probabilistic PCA (Rajiv Khanna, Joydeep Ghosh, Russell Poldrack, Oluwasanmi Koyejo)

# Signal boost: Postdoc in Privacy at Penn State

Sofya Raskhodnikova and Adam Smith are looking to fill a postdoc position at Penn State for a multi-year project on privacy, streaming and learning.

Qualifications: Ph.D., with expertise in the theoretical foundations of at least one of the research areas (algorithms, machine learning and statistics, data privacy). Willingness to work on a cross-disciplinary project.

Duration and compensation: At least one year, renewable. Start date is
negotiable, though we slightly prefer candidates starting fall 2015. Salary is competitive.

Applicants should email a CV, short research statement and list of references directly to the project leaders ({asmith,sofya}@cse.psu.edu) with “postdoc” in the subject line.

Location: The university is located in the beautiful college town of
State College in the center of Pennsylvania. The State College area has 130,000 inhabitants and offers a wide variety of cultural and outdoor recreational activities. The university offers outstanding events from collegiate sporting events to fine arts productions. Many major population centers on the east coast (New York, Philadelphia, Pittsburgh, Washington D.C., Baltimore) are only a few hours’ drive away and convenient air services to several major hubs are operated by three major airlines out of State College.

Penn State is an equal opportunity employer. We encourage applications from underrepresented minorities.

# ITA 2015: quick takes

Better late than never, I suppose. A few weeks ago I escaped the cold of New Jersey to my old haunts of San Diego. Although La Jolla was always a bit fancy for my taste, it’s hard to beat a conference which boasts views like this:

A view from the sessions at ITA 2015

I’ll just recap a few of the talks that I remember from my notes — I didn’t really take notes during the plenaries so I don’t have much to say about them. Mostly this was due to laziness, but finding the time to blog has been challenging in this last year, so I think I have to pick my battles. Here’s a smattering consisting of

$\{ \mathrm{talks\ attended} \} \cap \{ \mathrm{talks\ with\ understandable\ notes} \}$

(Information theory)
Emina Soljanin talked about designing codes that are good for fast access to the data in distributed storage. Initial work focused on how to repair codes under disk failures. She looked at how easy it is to retrieve the information afterwords to guarantee some QoS for the storage system. Adam Kalai talked about designing compression schemes that work for an “audience” of decoders. The decoders have different priors on the set of elements/messages so the idea is to design an encoder that works for this ensemble of decoders. I kind of missed the first part of the talk so I wasn’t quite sure how this relates to classical work in mismatched decoding as done in the information theory world. Gireeja Ranade gave a great talk about defining notions of capacity/rate need to control a system which as multiplicative uncertainty. That is, $x[n+1] = x[n] + B[n] u[n]$ where $B[n]$ has the uncertainty. She gave a couple of different notions of capacity, relating to the ratio $| x[n]/x[0] |$ — either the expected value of the square or the log, appropriately normalized. She used a “deterministic model” to give an explanation of how control in this setting is kind of like controlling the number of significant bits in the state: uncertainty increases this and you need a certain “amount” of control to cancel that growth.

(Learning and statistics)
I learned about active regression approaches from Sivan Sabato that provably work better than passive learning. The idea there is do to use a partition of the X space and then do piecewise constant approximations to a weight function that they use in a rejection sampler. The rejection sampler (which I thought of as sort of doing importance sampling to make sure they cover the space) helps limit the number of labels requested by the algorithm. Somehow I had never met Raj Rao Nadakuditi until now, and I wish I had gotten a chance to talk to him further. He gave a nice talk on robust PCA, and in particular how outliers “break” regular PCA. He proposed a combination of shrinkage and truncation to help make PCA a bit more stable/robust. Laura Balzano talked about “estimating subspace projections from incomplete data.” She proposed an iterative algorithm for doing estimation on the Grassmann manifold that can do subspace tracking. Constantine Caramanis talked about a convex formulation for mixed regression that gives a guaranteed solution, along with minimax sample complexity bounds showing that it is basically optimal. Yingbin Liang talked about testing approaches for understanding if there is an “anomalous structure” in a sequence of data. Basically for a sequence $Y_1, Y_2, \ldots, Y_n$, the null hypothesis is that they are all i.i.d. $\sim p$ and the (composite) alternative is that there an interval of indices which are $\sim q$ instead. She proposed a RKHS-based discrepancy measure and a threshold test on this measure. Pradeep Ravikumar talked about a “simple” estimator that was a “fix” for ordinary least squares with some soft thresholding. He showed consistency for linear regression in several senses, competitive with LASSO in some settings. Pretty neat, all said, although he also claimed that least squares was “something you all know from high school” — I went to a pretty good high school, and I don’t think we did least squares! Sanmi Koyejo talked about a Bayesian devision theory approach to variable selection that involved minimizing some KL-divergence. Unfortunately, the resulting optimization ended up being NP-hard (for reasons I can’t remember) and so they use a greedy algorithm that seems to work pretty well.

(Privacy)
Cynthia Dwork gave a tutorial on differential privacy with an emphasis on the recent work involving false discovery rate. In addition to her plenary there were several talks on differential privacy and other privacy measures. Kunal Talwar talked about their improved analysis of the SuLQ method for differentially private PCA. Unfortunately there were two privacy sessions in parallel so I hopped over to see John Duchi talk about definitions of privacy and how definitions based on testing are equivalent to differential privacy. The testing framework makes it easier to prove minimax bounds, though, so it may be a more useful view at times. Nadia Fawaz talked about privacy for time-series data such as smart meter data. She defined different types of attacks in this setting and showed that they correspond to mutual information or directed mutual information, as well as empirical results on a real data set. Raef Bassily studied a estimation problem in the streaming setting where you want to get a histogram of the most frequent items in the stream. They reduce the problem to one of finding a “unique heavy hitter” and develop a protocol that looks sort of like a code for the MAC: they encode bits into a real vector, had noise, and then add those up over the reals. It’s accepted to STOC 2015 and he said the preprint will be up soon.

# Transactions on Signal and Information Processing Over Networks: now accepting papers

I’m an Associate Editor for the new IEEE Transactions on Signal and Information Processing Over Networks, and we are accepting submissions now. The Editor-In-Chief is Petar M. Djurić.

The new IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data.

To submit a paper, go to Manuscript Central.

Topics of interest include, but are not limited to the following:

Adaptation, Detection, Estimation, and Learning

• Distributed detection and estimation
• Distributed adaptation over networks
• Distributed learning over networks
• Distributed target tracking
• Bayesian learning; Bayesian signal processing
• Sequential learning over networks
• Decision making over networks
• Distributed dictionary learning
• Distributed game theoretic strategies
• Distributed information processing
• Graphical and kernel methods
• Consensus over network systems
• Optimization over network systems

Communications, Networking, and Sensing

• Distributed monitoring and sensing
• Signal processing for distributed communications and networking
• Signal processing for cooperative networking
• Signal processing for network security
• Optimal network signal processing and resource allocation

Modeling and Analysis

• Performance and bounds of methods
• Robustness and vulnerability
• Network modeling and identification
• Simulations of networked information processing systems
• Social learning
• Bio-inspired network signal processing
• Epidemics and diffusion in populations

Imaging and Media Applications

• Image and video processing over networks
• Media cloud computing and communication
• Multimedia streaming and transport
• Social media computing and networking
• Signal processing for cyber-physicalsystems
• Wireless/mobile multimedia

Data Analysis

• Processing, analysis, and visualization of big data
• Signal and information processing for crowd computing
• Signal and information processing for the Internet of Things
• Emergence of behavior

Emerging topics and applications

• Emerging topics
• Applications in life sciences, ecology, energy, social networks, economic networks, finance, social sciences, smart grids, wireless health, robotics, transportation, and other areas of science and engineering