# Bellairs Workshop 2013

I just wanted to write a few words about the workshop at the Bellairs Research Institute. I just returned from sunny Barbados to frigid Chicago, so writing this will help me remember the sunshine and sand:

The beach at Bathsheba on the east coast of Barbados

Mike Rabbat put on a great program this year, and there were lots of talks on a range of topics in machine learning, signal processing, and optimization. The format of the workshop was to have talks with lots of room for questions and discussion. Talks were given out on the balcony where we were staying, and we had to end at about 2:30 because the sunshine would creep into our conference area, baking those of us sitting too far west.

On Monday Jarvis Haupt talked about adaptive compressive sensing and using structured dictionaries to help get better sampling bounds. I have not really been following the compressed sensing literature that well, so this was an interesting talk with some practical applications — he showed some for image processing. The basic idea is that you can get better bounds by exploiting structure in your dictionary/basis. In particular, he was looking at tree-structured bases. Carlo Fischione discussed “fast Lipschitz optimization,” which I had not really been aware of — the basic idea seems to be that certain optimization problems can be solved much more efficiently using a kind of constraint satisfaction framework rather than Lagrangain methods. For distributed programs this can save a lot of time since you don’t have to propagate gradient estimates but instead you can send constraints (I think?). In the afternoon, Petar Djurić talked about Bayesian learning in a network of cooperative agents when they observe a common parameter vector through local noisy channels of the form $H_k \theta + z$ The main question is how to design estimators which approximate the centralized estimator, and he presented some preliminary work on filtering and estimation in this setting.

On Tuesday, Waheed Bajwa talked about a geometric perspective on feature selection (which he called model selection) using ideas from compressed sensing. The simple idea here was to just compute local correlations between the $|x_i^T y|$ where $x_i$ is a feature (across observations) and $y$ is the observed output of a regression model $X \beta = y$. Thresholding and picking the $K$ largest correlations seems like a reasonable thing to do, but is there a theoretical justification? The answer is yes, based on a coherence argument using finite frame theory. Chathuranga Weeraddana talked about how to optimize a weighted sum-rate in a cellular network (a non convex problem) using an alternating optimization procedure. Base stations do local optimizations subject to inter-cell interference constraints and then they communicate via backhaul links to coordinate the inter-cell interference. The latter optimization uses a convex upper bound on the objective. Since all you can do is approximate the optimization, a real question is how to balance the two phases. In the afternoon we had talks by two students : Shohreh Shaghhaghian discussed how to do frequency reuse in the presence of cell sectorization, and Jun Ye Yu talked about how broadcast gossip (a subject near and dear to me) compares to building a spanning tree (for averaging) when using a real model/simulation for wireless networks.

On the third day Shreyas Sundaram talked about control approaches to average consensus with adversaries in the network and showed that the connectivity of the graph determines the number of adversaries which can be tolerated (e.g. the may number of vertex-disjoint paths that you can construct). This makes a connection to structured system theory which is interesting in and of itself. Alejandro Ribeiro discussed the mathematics of hierarchical clustering — you are given a set of points $\{x_i\}$ and a measure of dissimilarity (or “distance”) between pairs and you want to construct a hierarchical clustering or dendrogram for the data. This turns out to be the same as finding an ultra metric on the data. We have to learn the ultrametric from the given data. They develop upper and lower bounds on any ultrametric consistent with the data. There’s lots of cool math in here to learn! In the afternoon Sean Lawlor talked about how to detect convoys in vehicular traffic from data of the form (car ID, location, time). They have about 0.5 Tb of data so there’s a lot to sift through, and they are trying to find ways to tell if two cars are following each other from this. Yunpeng Li gave a talk about how to model signal strength when doing tracking of a person in a room using tomographic techniques (e.g. looking at how the disruption in the line-of-sight signal can help do tracking.

Finally on Thursday I gave a talk on differential privacy and machine learning algorithms with some connections to signal processing (hopefully) and Mike Rabbat gave a talk on how abstract distributed optimization algorithms interact with the gritty realities of cluster based computing. The upshot really is that the number of iterations you need to converge is not a great measure of the actual wall time of the computation in the cluster / data center. Communication costs a lot (possibly) and you should really assess the tradeoffs between communication and computation time to choose the size of your network.