paper a day : Recovery of Time Encoded Bandlimited Signals

Perfect Recovery and Sensitivity Analysis of Time Encoded Bandlimited Signals
Aurel A. Lazar and László T. Tóth
IEEE Transactions on Circuits and Systems – I: Regular Papers, vol. 51, no. 10, October 2004

I heard about this paper from Prakash Narayan, who came to give a seminar at last semester, and I thought it was pretty interesting. In an undergrad signals and systems class we usually spend most of our time talking about converting an analog signal x(t) into a discrete-time signal x[n] by sampling x(t) regularly in time. That is, we set x[n] = x(nT) for some sampling interval T. There are at least two reasons for doing things this way: it’s simple to explain and the math is beautiful. If x(t) is a bandlimited signal, it’s Fourier transform has support only on the interval [-B,B], and the famous Nyquist–Shannon sampling theorem (which goes by many other names) tells us that if 1/T > 2 B the original signal x(t) is recoverable from the sequence x[n]. Since there is all of this beautiful Fourier theory around we can convolve things and show fun properties about linear time-invariant systems with the greatest of ease.

So far so good. But sampling is only half the story in Analog to Digital conversion. Not only should the times be discretized, but the values should also be discretized. In this case we need to use a quantizer on the samples x[n]. Fast, high precision quantizers are expensive, especially at low power. On the other hand, clocks are pretty cheap. The suggestion in this paper is to think about time-encoding mechanisms (TEMs), in which we encode a bandlimited x(t) into a sequence of times {tk}. Another way to think about it is as a signal z(t) taking values +b or -b and switching values at the times {tk}. The disadvantage of this representation is that linear operations on analog signals don’t turn into linear operations on the discrete signals. However, this conversion can be implemented with an adder, integrator, and a noninverting Schmitt trigger that detects when the output of the integrator passes a threshold.

TEM Figure

This paper shows that the circuit above can be implemented in real time and that {tk} are sufficient to reconstruct x(t). The recovery algorithm looks pretty simple — multiplication by a certain pseudoinverse matrix. The second part of the paper deals with the stability of the system with respect to errors in the decoder parameter or time quantization. The compare the scheme to irregular sampling algorithms. The tools they use for this analysis come from non-harmonic analysis, but the math isn’t all that rough.

This work is different than the sampling with a finite rate of innovation work of Vetterli, Marziliano, and Blu, which says (loosely) that regular sampling is good for a wider class of signals than bandlimited signals. It would be interesting to see if a TEM mechanism is good for those signals as well. That might be another robustness issue to explore.

Finally, one interesting connection (and possible another motivation for this work) is that neurons may be doing this kind of time encoding all the time. The integrate-and-fire model of a neuron is, from a black-box perspective, converting an input signal into a sequence of times, just like a TEM.

Allerton 2007 : Networks and Algorithms

Constrained Consensus and Alternating Projection Methods
(Asu Ozdaglar)
The problem here is somewthing like the minimization of the sum of local utility functions, but here the values for the optimization can be constrained. Asu proposed an alternating projection algorithm that minimizes the objective and then projects back onto the feasible set. In order to prove the covergence rates, she separates the linear update error (which can be analyzed using standard Markov chain techniques) from the projection error, which requires some technical conditions on the constraints. The key is that the latter error is the only one that incorporates the constraints. The bound is obtained by looking at a stopped process and letting the stopping time go to infinity.

Network Coding Using Non-Commutative Algebras
(Shweta Agarwal and Sriram Vishwanath)
For multiple unicast sessions, finite field operations and linear coding may not be sufficient to achieve capacity for network codes. The proposal in this paper was code using modules over a noncommutative ring, but still use Gaussian elimination for decoding. I had to duck out towards the end, so I wasn’t sure if there explicit code constructions provided.

Rates of Convergence for Distributed Average Consensus with Probabilistic Quantization
(Tuncer Can Aysal, Mark Coates, and Michael Rabbat)
Most gossip and consensus algorithms assume we can do computation over the reals, which is not really feasible. This work tries to tackle the effect of quantization on the convergence rate. The probabilistic quantization rule they use is this : if the true outcome is distance d away from an endpoint of a quantization bin of length D, quantize to that point with probability d/D. The resulting scheme can be analyzed and all nodes will converge to the same point (an absorbing state of a Markov chain). In expectation, this point will be the true average, although it will over or under-estimate the average in any realization. One tricky part of the analysis is a very long waiting time for the average to settle on one value after all nodes have converged to one of the quantization points neighboring the true mean.

New Market Models and Algorithms
(Vijay Vazirani)
This talk was on pricing algorithms for links in networks. He started out by talking about the Fisher market, which is a multicommodity market model where every user has a fixed budget, goods are divisible, and each user has a different utility for the different goods. There is an algorithm for doing the allocations efficiently in these markets. The Kelly approach to analyzing TCP uses pricing on edges and a combinatorial auction to generate efficient flows, and it formally similar to Fisher’s model in the linear case. Vazirani presented a ascending price auction for links where the sinks are buyers in a multicast scenario. The resulting allocations are fair and efficient. By showing a connection to the Eisenberg-Gale (1957) optimization problem, he proposed a new class of Eisenberg-Gale markets in which a strongly polynomial time algorithm will exist for cases where there is a max-min result to exploit (like multicast).

Fast Gossip through Markov Chain Lifting
(Kyomin Jung, Devavrat Shah, and Jinwoo Shin)
The standard analysis of gossip algorithms uses results on the mixing time of reversible Markov chains. However, results of Chen et al. and Diaconis et al. show that nonreversible chains may mix much more quickly. In particular, a “lifting” construction can be used to embed expander graphs in the original graph (I think?). The lifting construction itself is quite cool — I read the paper on it on the plane on the way back and may write more about it. The trick is really to get a better bound on the conductance of the chain, which then bounds the mixing times.

Stochastic Approximation Analysis of Distributed Estimation Algorithms
(Ram Rajagopal and Martin Wainwright)
This paper was on computing nonlinear functions of sensor observations. In particular, suppose sensors have access to iid samples of a random variable X with unknown distribution. They would like to estimate the a-quantile — that is, the b such that P(X < b) = a. The nodes can communicate with each other over noisy or rate-constrained links. The performance measure that Ram uses is ratio of the decentralized MSE to the centralized MSE. The algorithm works by local flooding and updates, and the estimator is strongly consistent, with asymptotically normal error. What is more interesting is that the error is a function of the Laplacian of the communication graph.

Gossip Along the Way: Order-Optimal Consensus through Randomized Path Averaging
(Florence Bénézit, Alexandros Dimakis, Patrick Thiran, and Martin Vetterli)
This algorithm is similar to the Geographic Gossip algorithm that Alex and I worked on. In that work a node would wake up, pick a target at random, and route a packet to the closest node to the target to perform one pairwise exchange in a standard gossip algorithm. The corresponding Markov chain went from having a mixing time of O(n) to O(1) at the expense of O(\sqrt{n}) extra transmissions. In this work, they change the algorithm to aggregate and average all values along the routing path from the source to destination, so instead of averaging 2 nodes in each round, they average O(\sqrt{n}) nodes in each round. This shaves off the extra O(\sqrt{n}) from routing and so they get order-optimal performance (compared to the centralized algorithm, perhaps modulo all the log terms that I omitted in this brief description. The key to the analysis is to eschew computing the actual chain but instead look at “typical” paths and construct flows using the comparison method (original in Diaconis/Stroock, I think) to bound the mixing time.

Allerton 2007 : Information Theory

Because Allerton proceedings don’t get handed out at the conference, my understanding of what I saw is upper bounded by the quality of my notes. If anyone has any clarifications or corrections, please let me know!

A Rate-Distortion Approach to Anonymous Networking
(Parvathinathan Venkitasubramaniam and Lang Tong)
This problem was looking at networks in which we want to keep eavesdropping nodes from doing traffic analysis. There is a technique called Chaum mixing that can be used by trusted routers to scramble traffic and increase anonymity. The measure of anonymity they use is an equivocation at the eavesdropper normalized by the entropy of the sessions. The problem then becomes a rate-distortion problem for scheduling sessions in the network. They use two classes of nodes, covert relays and normal relays, and try to find tradeoffs between the density of covert relays and the anonymity.

Wireless Network Information Flow
(Salman Avestimehr, Suhas Diggavi, and David Tse)
Suhas presented this paper, which was on the deterministic model that David presented at ITW. Since we have broadcast and multiple access in wireless networks, the signal interactions become quite complex. The deterministic model reduces these networks to deterministic finite-field networks, which have a simple cut-set upper bound in terms of the rank of associated matrices. This outer bound is achievable as well, so the capacity of this simplified model can be calculated and looks a bit like the Ford-Fulkerson theorem in a more general sense.

Source Coding with Mismatched Distortion Measures
(Urs Niesen, Devavrat Shah, and Gregory Wornell)
If the distortion measure that we are to use is not known at the time of encoding, what should we do? To be more concrete, suppose we have f and g as distortion measures. Given a rate R and distortion D for the source under distortion f, what is the smallest distortion Ewe can achieve under the same code with measure g? It can be arbitrarily bad! But we can also write the performance using a single-letter characterization. There was a second problem Urs discussed, bitu I didn’t quite catch it — I think it was fixing D under f and asking what E we can get under g, regardless of the rate.

A Deterministic Approach to Wireless Relay Networks
(Salman Avestimehr, Suhas Diggavi, and David Tse)
This was a continuation of the work in the earlier talk, providing the details of the proof. The first part shows that if all paths from source to sink in the network are of equal length, then the capacity is the cut-set bound. In order to deal with unequal length paths, they use a time expansion argument (which looked a bit like a trellis, actually) with scheduling to avoid different flows mixing. It’s a bit tricky to show that you get back to the cut set bound with this, but I assume the details are in the paper. The other part was to show that this model can be used to construct a scheme for the original Gaussian channel that is “close” to optimal, and that the maximum gap (one bit in many cases) occurs in a very small regime of parameters.

The Approximate Capacity of an One-Sided Gaussian Interference Channel
(Guy Bresler, Abhay Parekh, and David Tse)
This was yet another paper using the deterministic model, this time for the many-to-one (and by a clever duality argument on the interference pattern, the one-to-many) interference channel, in which one transmit-receive pair experiences (inflicts) interference from (on) everyone else. The central idea here is interference alignment — if one signal level experiences interference from a transmitter, it can experience interference from other transmitters with no problem (or so I understood it). Guy showed that the Han-Kobayashi scheme is not optimal because the interference pattern is “too crowded,” and gave a lattice-based scheme that emulates the deterministic model.

SSP 2007 : aftermath

I just finished up attending the 2007 Statistical Signal Processing Workshop in Madison. Normally I would have blogged about all the talks I saw, but (a) the only “talks” were tutorials and plenaries, and (b) I’m a little burned out to write much. Despite the fact that I applied to grad schools for signal processing and took the DSP prelim exam at Berkeley, I’m not really much of a signal processing guy these days. All of the submitted papers were given as posters, and despite being organized into “sessions,” all the posters were in the same room, so there were about 30-40 talks going on at the same time in parallel for 2 hours. I was a bit dubious at first, since my experience with poster presentations is that they have a large effort-to-value ratio, but this format worked for me. I was unfamiliar with about 80% of the problems that people were trying to solve, so going to talks would have made me confused. Instead, I could at least talk to someone and get the point of what they were trying to do, if not the scale of their contribution.

The one downside to the conference for me was that several of the posters that I wanted to see were in the same session as me, so I ended up missing them! Luckily I was next to “Distributed Average Consensus using Probabilistic Quantization,” which is right up my alley (from my work on gossip algorithms), but I could only listen in every once in a while. If only we could encode our talks using an erasure code — then if I listen to 7 minutes our of 10 I could interpolate the other 3…

yet another crunch

As if having the deadlines for ISIT and CISS 2007 so close to each other wasn’t bad enough, the deadlines for the Statistical Signal Processing Workshop and the Information Theory Workshop in Tahoe are both on April 1. In the former case I didn’t mind, since I didn’t have anything I wanted to send to CISS and besides, it’s bad to distract oneself with conference papers. But in the latter case, I have results to send and reasons to go to both. Perhaps going out of town for spring break wasn’t such a good plan after all…

paper a day : approximating high-dimensional pdfs by low-dimensional ones

Asymptotically Optimal Approximation of Multidimensional pdf’s by Lower Dimensional pdf’s
Steven Kay
IEEE Transactions on Signal Processing, V. 55 No. 2, Feb. 2007, p. 725–729

The title kind of says it all. The main idea is that if you have a sufficient statistic, then you can create the true probability density function (pdf) of the data from the pdf of the sufficient statistic. However, if there is no sufficient statistic, you’re out of luck, and you’d like to create a low-dimensional pdf that somehow best captures the features you want from the data. This paper proves that a certain pdf created by a projection operation is optimal in that it minimizes the Kullback-Leibler (KL) divergence. Since the KL divergence dictates the error in many hypothesis tests, this projection operation is good in that decisions based on the projected pdf will be close to decisions based on the true pdf.

This is a correspondence item, so it’s short and sweet — equations are given for the projection and it is proved to minimize the KL divergence to the true distribution. Examples are given for cases in which sufficient statistics exist and do not exist, and an application to feature selection for discrimination is given. The benefit is that this theorem provides a way of choosing a “good” feature set based on the KL divergence, even when the true pdf is not known. This is done by estimating an expectation from the observed data (the performance then depends on the convergence speed of the empirical mean to the true mean, which should be exponentially fast in the number of data points).

The formulas are sometimes messy, but it looks like it could be a useful technique. I have this niggling feeling that a “bigger picture” view would be forthcoming from looking at information geometry/differential geometry viewpoint, but my fluency in those techniques is lacking at the moment.

Update: My laziness prevented me from putting up the link. Thanks, Cosma, for keeping me honest!