ah, the job market

So I (along with many many other people) am looking for academic jobs this year. I wish I could say it felt like only yesterday that I was applying to grad school, but that would be somewhere between wishful thinking and outright mendacity.

Claire Kenyon has a post up at the Complexity Blog that (predictably) has come under some fire, although the comments thread seems to be dominated by a debate over how companies should conduct interviews of PhDs. I think it’s remarkably difficult to come up with universal specific recommendations for your application documents, because the requirements vary so much from university to university. Some ask for a combined research and teaching statement, or the page limits will vary from place to place. Even the delivery medium varies — some places ask you to upload a PDF cover letter (with no signature, I assume), some ask you to email your materials, and some ask you to mail a physical letter.

The one great thing about this process is that I am getting a better sense of how to contextualize my research in a way that gets me excited about new problems. If it wasn’t for the whole thesis writing thing, I’d have enough time to dig in to some of those problems…

Adjuncts in the academy

The tone of this NY Times article on the use of adjuncts struck me as odd. Perhaps because I am inside the ivory tower, the handwringing over the quality of education provided by adjuncts seems over-exaggerated. A bigger problem seems to be how the adjuncts themselves have no power and no real support, a point which is mentioned in the article but with a different spin. Some of this came out in the letters to the editor, but I think the article’s bias shows a bit about who the NY Times’ audience is — parents worried about whether they’re getting a good deal for their money on their kid’s education.

perl script for merging .tex files for ArXiV

I thought a good use of part of my Sunday would be to (re?)learn a little Perl and make my life a bit easier. Suppose you have a paper that’s split up into many files. This script will parse your main LaTeX file and generate a single LaTeX file for the whole paper. It searches recursively for “\input{}” commands and replaces them with the relevant file. It seems to be functional enough to use, but everyone has their own LaTeX usage conventions, so YMMV.

This is particularly handy for submitting papers to ArXiV, which requires a single .tex file, and cutting and pasting in 20 little files can become a bit of a pain. With the perl script for stripping comments, you can generate the .tex for your ArXiV submission on the fly.

I spent a little time searching for a script like this online, since I’m sure many people have written such a thing, but I didn’t turn anything up. Hopefully this will be of use to others looking for a similar hack. Sorry for the lines getting chopped off in the display — it should show up if you cut and paste the text.

#!/usr/bin/perl

# mergetex.pl
#
# Script for merging tex files into a single monolithic file.  This
# script should make it easy to generate an ArXiV-friendly single
# .tex file from a paper that is broken into subfiles using LaTeX's
# \input{} command.
#
# USAGE:
#     ./mergetex.pl  [input]  [output]
#     ./mergetex.pl  mypaper.tex  mypaperMerged.tex
#
# mergetex takes two arguments, the [input] file and the [output]
# file into which the merged files should go.  It recursively
# searches [input] and adds any file given by uncommented \input{}
# commands.
#
# v0.1 by Anand Sarwate (asarwate@alum.mit.edu) with tips from Erin Rhode.

use IO::File;

if ( scalar(@ARGV) != 2) {
   die "Syntax Error : use 'mergetex.pl [input] [output]'"
}
$infile = shift;
$outfile = shift;
print "Generating tex file from $infile and storing in $outfile.\n";
my $outputfh = new IO::File($outfile, "w") or die "Could not open $outfile: $!\n";
my $inputfh = new IO::File($infile, "r") or die "Could not open $infile: $!\n";
parsetex($inputfh,$infile);
$outputfh->close();

sub parsetex {
  my $fh = shift;
  my $file = shift;
  print "Found $file.  Merging into $outfile.\n";
  while () {
    # strip out comments
    $decom = $_;
    $decom =~ s/^\%.*$/\%/;
    $decom =~ s/([^\\])\%.*$/\1\%/g;
    # search for lines with "\input{}"
    if ($decom =~ /\\input{(.*?)}/) {
      #get new file name
      if ($1 =~ /.tex/) {
        my $newfile = $1;
      } else {
	$newfile = $1 . ".tex";
      }
      # recurse to input new file
      my $newfh = new IO::File($newfile, "r") or die "Could not open $infile: $!\n";
      parsetex($newfh,$newfile);
    }
    else {
      $outputfh->print($decom);
    }
  }
  $fh->close();
}

Limited feedback achieves the empirical capacity

For those who think writing in the blog precludes doing work, here’s a snapshot of something due in Sunday’s ArXiV update:

Limited feedback achieves the empirical capacity
Krishnan Eswaran, Anand D. Sarwate, Anant Sahai, Michael Gastpar
(Submitted on 2 Nov 2007)

The utility of limited feedback for coding over an individual sequence of DMCs is investigated. This study complements recent results showing how limited or noisy feedback can boost the reliability of communication. A strategy with fixed input distribution $P$ is given that asymptotically achieves rates arbitrarily close to the mutual information induced by $P$ and the state-averaged channel. When the capacity achieving input distribution is the same over all channel states, this achieves rates at least as large as the capacity of the state averaged channel, sometimes called the empirical capacity.

Mathematical methods in systems EE?

One thing that I have definitely benefited from in graduate school is my undergraduate background/degree in mathematics. It’s not that I use the contents of the math classes I have taken — Dynkin diagrams and cohomology don’t often appear in information theory or signal processing problems — but I am pretty comfortable with mathematical formalism. Grad students who didn’t do as much math usually get caught up on measure theory and so on by taking the graduate probability sequence or analysis sequence. What we don’t get is a survey of all the mathematical tools which we could use to attack different problems so that we know what kind of tools may relate to a given problem and where to look for it.

I think a first-year topics lecture series that introduces some new mathematical concepts to first-year graduate students could be great. Subjects like algebraic graph theory, auctions and mechanism design, random matrices, and so on may form a small unit in a regular class or may not be covered in classes at all. Not everyone wants to sit through a reading group on percolation theory, but they might want to get a basic idea of what percolation results are like and how they are useful (e.g. network connectivity).

On the other hand, maybe if such a lecture series were offered nobody would come and it would be useless. Any thoughts?

Allerton 2007 : Networks and Algorithms

Constrained Consensus and Alternating Projection Methods
(Asu Ozdaglar)
The problem here is somewthing like the minimization of the sum of local utility functions, but here the values for the optimization can be constrained. Asu proposed an alternating projection algorithm that minimizes the objective and then projects back onto the feasible set. In order to prove the covergence rates, she separates the linear update error (which can be analyzed using standard Markov chain techniques) from the projection error, which requires some technical conditions on the constraints. The key is that the latter error is the only one that incorporates the constraints. The bound is obtained by looking at a stopped process and letting the stopping time go to infinity.

Network Coding Using Non-Commutative Algebras
(Shweta Agarwal and Sriram Vishwanath)
For multiple unicast sessions, finite field operations and linear coding may not be sufficient to achieve capacity for network codes. The proposal in this paper was code using modules over a noncommutative ring, but still use Gaussian elimination for decoding. I had to duck out towards the end, so I wasn’t sure if there explicit code constructions provided.

Rates of Convergence for Distributed Average Consensus with Probabilistic Quantization
(Tuncer Can Aysal, Mark Coates, and Michael Rabbat)
Most gossip and consensus algorithms assume we can do computation over the reals, which is not really feasible. This work tries to tackle the effect of quantization on the convergence rate. The probabilistic quantization rule they use is this : if the true outcome is distance d away from an endpoint of a quantization bin of length D, quantize to that point with probability d/D. The resulting scheme can be analyzed and all nodes will converge to the same point (an absorbing state of a Markov chain). In expectation, this point will be the true average, although it will over or under-estimate the average in any realization. One tricky part of the analysis is a very long waiting time for the average to settle on one value after all nodes have converged to one of the quantization points neighboring the true mean.

New Market Models and Algorithms
(Vijay Vazirani)
This talk was on pricing algorithms for links in networks. He started out by talking about the Fisher market, which is a multicommodity market model where every user has a fixed budget, goods are divisible, and each user has a different utility for the different goods. There is an algorithm for doing the allocations efficiently in these markets. The Kelly approach to analyzing TCP uses pricing on edges and a combinatorial auction to generate efficient flows, and it formally similar to Fisher’s model in the linear case. Vazirani presented a ascending price auction for links where the sinks are buyers in a multicast scenario. The resulting allocations are fair and efficient. By showing a connection to the Eisenberg-Gale (1957) optimization problem, he proposed a new class of Eisenberg-Gale markets in which a strongly polynomial time algorithm will exist for cases where there is a max-min result to exploit (like multicast).

Fast Gossip through Markov Chain Lifting
(Kyomin Jung, Devavrat Shah, and Jinwoo Shin)
The standard analysis of gossip algorithms uses results on the mixing time of reversible Markov chains. However, results of Chen et al. and Diaconis et al. show that nonreversible chains may mix much more quickly. In particular, a “lifting” construction can be used to embed expander graphs in the original graph (I think?). The lifting construction itself is quite cool — I read the paper on it on the plane on the way back and may write more about it. The trick is really to get a better bound on the conductance of the chain, which then bounds the mixing times.

Stochastic Approximation Analysis of Distributed Estimation Algorithms
(Ram Rajagopal and Martin Wainwright)
This paper was on computing nonlinear functions of sensor observations. In particular, suppose sensors have access to iid samples of a random variable X with unknown distribution. They would like to estimate the a-quantile — that is, the b such that P(X < b) = a. The nodes can communicate with each other over noisy or rate-constrained links. The performance measure that Ram uses is ratio of the decentralized MSE to the centralized MSE. The algorithm works by local flooding and updates, and the estimator is strongly consistent, with asymptotically normal error. What is more interesting is that the error is a function of the Laplacian of the communication graph.

Gossip Along the Way: Order-Optimal Consensus through Randomized Path Averaging
(Florence Bénézit, Alexandros Dimakis, Patrick Thiran, and Martin Vetterli)
This algorithm is similar to the Geographic Gossip algorithm that Alex and I worked on. In that work a node would wake up, pick a target at random, and route a packet to the closest node to the target to perform one pairwise exchange in a standard gossip algorithm. The corresponding Markov chain went from having a mixing time of O(n) to O(1) at the expense of O(\sqrt{n}) extra transmissions. In this work, they change the algorithm to aggregate and average all values along the routing path from the source to destination, so instead of averaging 2 nodes in each round, they average O(\sqrt{n}) nodes in each round. This shaves off the extra O(\sqrt{n}) from routing and so they get order-optimal performance (compared to the centralized algorithm, perhaps modulo all the log terms that I omitted in this brief description. The key to the analysis is to eschew computing the actual chain but instead look at “typical” paths and construct flows using the comparison method (original in Diaconis/Stroock, I think) to bound the mixing time.

Allerton 2007 : Information Theory

Because Allerton proceedings don’t get handed out at the conference, my understanding of what I saw is upper bounded by the quality of my notes. If anyone has any clarifications or corrections, please let me know!

A Rate-Distortion Approach to Anonymous Networking
(Parvathinathan Venkitasubramaniam and Lang Tong)
This problem was looking at networks in which we want to keep eavesdropping nodes from doing traffic analysis. There is a technique called Chaum mixing that can be used by trusted routers to scramble traffic and increase anonymity. The measure of anonymity they use is an equivocation at the eavesdropper normalized by the entropy of the sessions. The problem then becomes a rate-distortion problem for scheduling sessions in the network. They use two classes of nodes, covert relays and normal relays, and try to find tradeoffs between the density of covert relays and the anonymity.

Wireless Network Information Flow
(Salman Avestimehr, Suhas Diggavi, and David Tse)
Suhas presented this paper, which was on the deterministic model that David presented at ITW. Since we have broadcast and multiple access in wireless networks, the signal interactions become quite complex. The deterministic model reduces these networks to deterministic finite-field networks, which have a simple cut-set upper bound in terms of the rank of associated matrices. This outer bound is achievable as well, so the capacity of this simplified model can be calculated and looks a bit like the Ford-Fulkerson theorem in a more general sense.

Source Coding with Mismatched Distortion Measures
(Urs Niesen, Devavrat Shah, and Gregory Wornell)
If the distortion measure that we are to use is not known at the time of encoding, what should we do? To be more concrete, suppose we have f and g as distortion measures. Given a rate R and distortion D for the source under distortion f, what is the smallest distortion Ewe can achieve under the same code with measure g? It can be arbitrarily bad! But we can also write the performance using a single-letter characterization. There was a second problem Urs discussed, bitu I didn’t quite catch it — I think it was fixing D under f and asking what E we can get under g, regardless of the rate.

A Deterministic Approach to Wireless Relay Networks
(Salman Avestimehr, Suhas Diggavi, and David Tse)
This was a continuation of the work in the earlier talk, providing the details of the proof. The first part shows that if all paths from source to sink in the network are of equal length, then the capacity is the cut-set bound. In order to deal with unequal length paths, they use a time expansion argument (which looked a bit like a trellis, actually) with scheduling to avoid different flows mixing. It’s a bit tricky to show that you get back to the cut set bound with this, but I assume the details are in the paper. The other part was to show that this model can be used to construct a scheme for the original Gaussian channel that is “close” to optimal, and that the maximum gap (one bit in many cases) occurs in a very small regime of parameters.

The Approximate Capacity of an One-Sided Gaussian Interference Channel
(Guy Bresler, Abhay Parekh, and David Tse)
This was yet another paper using the deterministic model, this time for the many-to-one (and by a clever duality argument on the interference pattern, the one-to-many) interference channel, in which one transmit-receive pair experiences (inflicts) interference from (on) everyone else. The central idea here is interference alignment — if one signal level experiences interference from a transmitter, it can experience interference from other transmitters with no problem (or so I understood it). Guy showed that the Han-Kobayashi scheme is not optimal because the interference pattern is “too crowded,” and gave a lattice-based scheme that emulates the deterministic model.

ITW 2007 : Days Three and Four

  • On Optimal Information Storage in Synapses
    (Lav R. Varshney and Dmitri B. Chklovskii)
    Lav presented a mathematical model for storing information in synapses that could be modeled as a communication channel with input costs, and then calculated the capacity per unit cost to derive some principles for a good memory design. These principles are consistent with biological evidence about the brain, which shows that reality is consistent with optimizing efficiency in this model (but not, naturally, that nature is optimizing this model). The model is that we are coding over a set of n synapses in space. The input alphabet X is the synaptic strength, so a codeword is a sequence of synaptic strengths for the n synapses. The cost function b(x) for a particular synaptic strength measures the volume of a synapse, so that stronger synapses are larger. By using an SNR model for the channel, they can calculate this capacity per unit cost. The model can explain some properties such as the optimal exponent of the cost function, the profile of synaptic strengths and the benefits of noisy synapses, and the reasons for sparse connectivity.
  • Learning from Compressed Observations
    (Maxim Raginsky)
    Maxim presented a set-up in which he coupled the “learning from examples” paradigm of machine learning to a rate-distortion problem. In learning from a set of examples {Xi, Yi}, we are interested in finding a map f : X -> Y that can assign new Y values to fresh samples X. Some targets for making a good classifier are minimizing the empirical risk or the generalization error, and it is the latter criterion on which this work focused. The problem is this : if the X samples are available to both the encoder and decoder but the Y‘s must be compressed to rate R, what is the set of rate-generalization error tradeoff curves. By using some heavier function-analytic machinery he can get an expression for an achievable rate as the distortion-rate function over a set of distortions depending on Y.
  • Intermediate Performance of Rateless Codes
    (Sujay Sanghavi)
    Designing a rateless code for erasures like a Fountain or Raptor code corresponds to designing a degree distribution on the graph that defines the codes. In particular, the soliton distribution can be shown to achieve capacity. The question here is the decoder performance when the number of unerased symbols is less than the number needed to decode — can we partially recover? That is, if we need K symbols, but only get r K symbols, for what t can we get t K symbols correctly? The soliton distribution performs poorly in this respect — the error is catastrphic. Sujay gets a pointwise outer bound on the performance of any code (via linear programming), and each point corresponds to a different input distribution. The proof uses some embeddings into a fluid model due to Darling and Norris.
  • On Secure Distributed Source Coding
    (Vinod Prabhakaran and Kannan Ramchandran)
    Suppose Alice and Bob each have correlated sources and Alice wants to communicate her source to Bob. Eve has a source of her own correlated with Alice and Bob’s, and can eavesdrop on their communication. They formulate their problem in terms of the leakage rate — the rate at which information leaks to Eve. For one-way communication from Alice to Bob they characterize the leakage rate completely, and show it is achievable using a “secure Slepian-Wolf” coding scheme. For two-way communication they get bounds in terms of the secret key rate, and they also get the leakage capacity for a certain multiple-access setup.
  • A Hot Channel
    (Tobias Koch, Amos Lapidoth, and Paul P. Sotiriadis)
    This paper used a (rather simplified) model for communication on a chip. In this model, the variance of the noise is a linear combination of additional noise and the average power expended by the transmitter in earlier channel steps. This “heating up” property makes the channel a bit tricky to handle. One interesting way to attack it is by looking at the capacity per unit cost function, where the cost is the SNR — this was done in earlier work at ISIT. The main result is a characterization of the capacity itself in terms of the coefficient sequence that dictates how the noise variance is calculated. In particular, if the memory in this noise model decays geometrically, then the capacity is bounded.
  • A Deterministic Model for Wireless Channels and its Applications
    (David Tse)
    David presented a new way of thinking about wireless channels in which the transmitters and receivers only transmit bits. The goal is to do away with the noise and instead focus on how different user’s signals interact/interfere with each other. We think of each bit as occupying a signal level, so that there is an ordering from MSB to LSB, where the MSB is occupying a “high power level.” This is motivated by the approximation log(1 + SNR) = log(SNR) at high SNR. The SNR determines how many bits that were transmitted can be received at the decoder. So in a broadcast setting, the transmitter may send 5 bits, one receiver will get all 5, and the other will only get the 3 most significant bits (MSBs). Interference is modeled as XORing the bits. He stepped through several examples of building deterministic models, and then addressed how close the rates derived from these models are to the optimal rate, along the lines of the “within one bit” results on the interference channel. There will be a number of talks at Allerton on how to use this model for different channels, including general relay networks, MIMO channels, and the interference channel.