Postdoc positions at UT Austin

The Simons Postdoc positions are open:

The ECE department at The University of Texas at Austin seeks highly qualified candidates for postdoctoral fellowship positions, lasting up to two years, in the information sciences, broadly defined. Applicants should have, or be close to completing, a PhD in ECE, CS, Math, Statistics or related fields.

RIP Aaron Swartz

Aaron Swartz, who most recently made headlines for expropriating a large amount of information that was on JSTOR and making it available to the public, committed suicide. Cory Doctorow has a remembrance of Aaron and also a reminder of how we should remember how terrible depression can be. In making sense of what happened it’s tempting to say the threat of prosecution was the “cause,” but we shouldn’t lose sight of the person and the real struggles he was going through.

CRA Best Practices on Mentoring Postdocs

I just got the CRA newsletter, and it had a link to a document on best practices for mentoring postdocs:

… data from the Computing Research Association’s (CRA) annual Taulbee Survey indicate that the numbers of recent Ph.D.s pursuing postdocs following graduate school soared from 60 in 1998 to 249 in 2011 (three-year rolling averages), an increase of 315 percent during this period. Because research organizations are suddenly channeling many more young researchers into these positions, it is incumbent upon us as a community to have a clear understanding of the best practices associated with pursuing, hosting, and nurturing postdocs.

I think you’d find the same numbers in EE as well. This report relies a fair bit on the National Academies report, which is a little out of date and I thought very skewed towards those in the sciences. Engineering is a different beast (and perhaps computer science an even more different beast), so I think that while there are some universal issues, the emphasis and importance of different aspects varies across fields quite a bit. For example, the NA report focuses quite a bit on fairness in recruiting which are predicated on the postdoc being a “normal” thing to do. By contrast, in many engineering fields postdoc positions are relatively new and there’s an opportunity to define what the position means and what it is for (i.e. not a person you can pay cheaply to supervise your graduate students for you).

Anyway, it’s worth reading!

PSA : ISIT submission formatting

If you, like me, tend to cart around old ISIT papers and just gut them to put in the new content for this year’s paper, don’t do it. Instead, download the template because the page size has changed from letter to a4.

Also, as a postscript to Sergio’s note that eqnarray is OVER, apparently Stefan recommends we use IEEEeqnarray instead of align.

Job opening : Chair at the Hamilton Institute

Vijay Subramanian passed along this job opening in case readers know of someone who would be interested…

FULL PROFESSOR, HAMILTON INSTITUTE, NATIONAL UNIVERSITY of IRELAND MAYNOOTH

The Hamilton Institute at the National University of Ireland Maynooth invites applications for a Chair position starting in Summer 2013. Appointment will be at full professor level. Exceptional candidates in all areas will be considered, although we especially encourage candidates working in areas that complement existing activity in the mathematics of networks (distributed optimisation, feedback control, stochastic processes on graphs) as applied to smart transport, smart city data analytics and wireless networks.

The Hamilton Institute is a dynamic and vibrant centre of excellence for applied mathematics research. The successful candidate will be a leading international researcher with a demonstrated ability to lead and develop new research directions. A strong commitment to research excellence and a successful track record in building strategic partnerships and securing independent funding from public competitive sources and/or through private investment are essential.

Informal enquires can be directed to Prof. Doug Leith (doug.leith@nuim.ie), Director of the Hamilton Institute. Details on the Hamilton Institute can be found at www.hamilton.ie.

Further information on the post and the application procedure can be found here.

The deadline for applications is 11th Feb 2013.

Active learning survey

I’ve been starting work on a problem related to active learning, and I wanted to get caught up on the literature. Luckily for me, Sanjoy Dasgupta has a nice survey (non-paywall version here) from 2011 on the subject. It’s a nice read, although I didn’t know “aggressive” and “mellow” were terms of art in active learning.

In active learning you have to query unlabeled points and ask for their labels — the goal is usually to learn something like a classifier, so you want to query a small number of points by being judicious about which ones to ask for. A mellow algorithm queries any informative point, where as an aggressive algorithm queries the “most informative point.” The former are often easier to analyze, because the latter end up sampling a “nonrepresentative” set of labeled points — if the points come i.i.d. from some distribution, the set of points you would label in an aggressive strategy will not look like they came from that distribution. Future work may look at semi-aggressive strategies. Perhaps we could call this line of research “harshing the mellow” by developing “harsh functions” which score points according to informativeness…

Linkage (technical)

Having seen a talk recently by John Ioannidis on how medical research is (often) bunk, this finer corrective by Larry Wasserman was nice to read.

Computer science conferences are often not organized by the ACM, but instead there are different foundations for machine learning and vision and so on that basically exist to organize the annual conference(s). At least, that is what I understand. There are a few which are run by the ACM, and there’s often debate about whether or not the ACM affiliation is worth it, given the overheads and so on. Boaz Barak had a post a little over a week ago making the case for sticking with the ACM. Given the hegemonic control of the IEEE on all things EE (more or less), this debate is new to me. As far as I can tell, ISIT exists to cover some of the cost of publishing the IT Transactions, and so it sort of has to be run by IEEE.

As mentioned before, Tara Javidi has a nice post up on what it means for one random variable to be stochastically less variable than another.

Paul Miniero has a bigger picture view of NIPS — I saw there were lots of papers on “deep learning” but it’s not really my area so I missed many of those posters.

David Eppstein’s top 10 cs.DS papers from 2012.

B-log on IT

Via Tara Javidi I heard about a new blog on information theory: the Information Theory b-log, which has been going for a few months now but I guess in more “stealth mode.” It’s mostly posts by Sergio Verdú, with some initial posting by Thomas Courtade, but the most recent post is by Tara on how to compare random variables from a decision point of view. However, as Max noted:

All researchers work­ing on infor­ma­tion the­ory are invited to par­tic­i­pate by post­ing items to the blog. Both orig­i­nal mate­r­ial and point­ers to the web are welcome.

NIPS 2012 : the rest of it

Almost a month later, I’m finishing up blogging about NIPS. Merry Christmas and all that (is anyone reading this thing?), and here’s to a productive 2013, research-wise. It’s a bit harder to blog these things because unlike a talk, it’s hard to take notes during a poster presentation.

Overall, I found NIPS to be a bit overwhelming — the single-track format makes it feel somehow more crowded than ISIT, but also it was hard for me to figure out how to strike the right balance of going to talks/posters and spending time talking to people and getting to know what they are working on. Now that I am fairly separated from my collaborators, conferences should be a good time to sit down and work on some problems, but somehow things are always a bit more frantic than I want them to be.

Anyway, from the rest of the conference, here are a few talks/posters that I went to and remembered something about.

T. Dietterich
Challenges for Machine Learning in Computational Sustainability
This was a plenary talk on machine learning problems that arise in natural resources management. There was a lot in this talk, and a lot of different problems ranging from prediction (for bird migrations, etc), imputation of missing data, and classification. These were real-world hands-on problems and one thing I got out of it is how much work you need to put into the making algorithms that work for the dat you have, rather than pulling some off-the-shelf works-great-in-theory method. He gave a version of this talk at TTI but I think the new version is better.

K. Muandet, K. Fukumizu, F. Dinuzzo, B. Schölkopf
Learning from Distributions via Support Measure Machines
This was on generalizing SVMs to take distributions as inputs instead of points — instead of getting individual points as training data, you get distributions (perhaps like clusters) and you have to do learning/classification on that kind of data. Part of the trick here is finding the right mathematical framework that remains computationally tractable.

J. Duchi, M. Jordan, M. Wainwright
Privacy Aware Learning
Since I work on privacy, this was of course interesting to me — John told me a bit about the work at Allerton. The model of privacy is different than the “standard” differential privacy model — data is stochastic and the algorithm itself (the learner) is not trusted, so noise has to be added to individual data points. A bird’s eye view of the idea is this : (1) stochastic gradient descent (SGD) is good for learning, and is robust to noise (e.g. noisy gradients), (2) noise is good at protecting privacy, so (3) SGD can be used to guarantee privacy by using noisy gradients. Privacy is measured here in terms of the mutual information between the data point and a noisy gradient using that data point. The result is a slowdown in the convergence rate that is a function of the mutual information bound, and it appears in the same place in the upper and lower bounds.

J. Wiens, J. Guttag, E. Horvitz
Patient Risk Stratification for Hospital-Associated C. Diff as a Time-Series Classification Task
This was a cool paper on predicting which patients would be infected with C. Diff (a common disease people get as a secondary infection from being the hospital). Since we have different data for each patient and lots of missing data, the classification problem is not easy — they try to assess a time-evolving risk of infection and then predict whether or not the patient will test positive for C. Diff.

P. Loh, M. Wainwright
No voodoo here! Learning discrete graphical models via inverse covariance estimation
This paper won a best paper award. The idea is that for Gaussian graphical models the inverse covariance matrix is graph-compatible — zeros correspond to missing edges. However, this is not true/easy to do for discrete graphical models. So instead they build the covariance matrix for all tuples of variables — \{X_1, X_2, X_3, X_4, X_1 X_2, X_1 X_3, \ldots \} (really what they want is a triangulation of the graph) and then show that indeed, the inverse covariance matrix does respect the graph structure in a sense. More carefully, they have to augment the variables with the power set of the maximal cliques in a triangulation of the original graphical model. The title refers to so-called “paranormal” methods which are also used for discrete graphical models.

V. Kanade, Z. Liu, B. Radunovic
Distributed Non-Stochastic Experts
This was a star-network with a centralized learner and a bunch of experts, except that the expert advice arrives at arbitrary times — there’s a tradeoff between how often the experts communicate with the learner and the achievable regret, and they try to quantify this tradeoff.

M. Streeter, B. McMahan
No-Regret Algorithms for Unconstrained Online Convex Optimization
There’s a problem with online convex optimization when the feasible set is unbounded. In particular, we would want to know that the optimal x^{\ast} is bounded so that we could calculate the rate of convergence. They look at methods which can get around this by proposing an algorithm called “reward doubling” which tries to maximize reward instead of minimize regret.

Y. Chen, S. Sanghavi, H. Xu
Clustering Sparse Graphs
Suppose you have a graph and want to partition it into clusters with high intra-cluster edge density and low inter-cluster density. They come up with nuclear-norm plus L_1 objective function to find the clusters. It seems to work pretty well, and they can analyze it in the planted partition / stochastic blockmodel setting.

P. Shenoy, A. Yu
Strategic Impatience in Go/NoGo versus Forced-Choice Decision-Making
This was a talk on cognitive science experimental design. They explain the difference between these two tasks in terms of a cost-asymmetry and use some decision analysis to explain a bias in the Go/NoGo task in terms of Bayes-risk minimization. The upshot is that the different in these two tasks may not represent a difference in cognitive processing, but in the cost structure used by the brain to make decisions. It’s kind of like changing the rules of the game, I suppose.

S. Kpotufe, A. Boularias
Gradient Weights help Nonparametric Regressors
This was a super-cute paper, which basically says that if the regressor is very sensitive in some coordinates and not so much in others, you can use information about the gradient/derivative of the regressor to rebalance things and come up with a much better estimator.

K. Jamieson, R. Nowak, B. Recht
Query Complexity of Derivative-Free Optimization
Sometimes taking derivatives is expensive or hard, but you can approximate them by taking two close points and computing an approximation. This requires the function evaluations to be good. Here they look at how to handle approximate gradients computed with noisy function evaluations and find the convergence rate for those procedures.

Linkage (technical)

Here’s a roundup of some interesting posts/pages on technical things.

Over at Larry Wasserman’s blog, Rob Tibshirani suggests 9 Great Statistics papers published after 1970. You know, in case you were looking for some light reading over winter break.

Videos from the DIMACS Differential Privacy Workshop are up.

All of these ads for jobs this year want someone who works on Big Data. But… do you really have big data? Or, as I like to ask, “how big is big, anyway?”

Speaking of big data, this talk by Peter Bartlett looks cool. (h/t Andrew Gelman)

Max Raginsky and Igal Sason have a tutorial on measure concentration. Log Sobolev inequalities are a dish best served cold.

I’ll probably do an ArXiV roundup sometime soon — trying to catch up on a backlog of reading and thinking lately.