Infocom 2009 : general thoughts and assorted topics

This was my first time going to Infocom, and I had a good time, despite being a bit out of my element. The crowd is a mix of electrical engineers and computer scientists, of theorists, experimentalists, and industry researchers. Although I often would go to a talk thinking I knew what it would be about, I often found that it was rather different than what I expected. At first this was a little alienating, but in the end I found it refreshing to be introduced to a host of new ideas and problems. Unfortunately, the lack of familiarity may translate into some misunderstandings in these posts; as usual, clarifying comments are welcome!

Below are the papers I couldn’t categorize into other areas.

Dynamic Power Allocation Under Arbitrary Varying Channels – An Online Approach
Niv Buchbinder (Technion University, IL); Liane Lewin-Eytan (Technion, IL); Ishai Menache (Massachusetts Institute of Technology, US); Joseph (Seffi) Naor (Technion, IL); Ariel Orda (Technion, IL)
The model is that of an online power allocation problem wherein the gain of a fading channel h_t varies over time in an unpredictable manner. At each time we can choose a power P_t for the transmitter based on the past values h_1, h_2, \ldots, h_{t-1} and we will get a reward U(h_t, P_t) = \log(1 + h_t P_t). Subject to a total power constraint, what is the optimal allocation policy, in terms of competitive optimality vs. an algorithm that knew h_t in advance? They show lower bounds for any algorithm of O( \log H_{\max}/H_{\min}) for the competitive ratio and an achievable strategy by quantizing the possible fading values and executing an online allocations strategy for the discrete problem. Of course I went to this talk because it claimed to be about arbitrarily varying channels, but what was a little unconvincing to me was the abstraction — you have to choose not only a transmit power, but also a coding rate, so it’s unclear that in reality you will get \log(1 + h_t P_t) utility each time. There may be a way to justify this kind of modeling using some rateless code constructions, which seems like an interesting problem to think about. UPDATE: I looked at the paper and saw that they are considering a block fading channel so you get that utility for a block of channel uses, and that the channel gain for the given slot is revealed to the transmitter, so you don’t need to be fancy with the underlying communication scheme.

Lightweight Coloring and Desynchronization for Networks
Arik Motskin (Stanford University, US); Tim Roughgarden (Stanford University, US); Primoz Skraba (Stanford University, US); Leonidas Guibas (Stanford University, US)
This paper was on duty-cyling nodes in a sensor network to preserve power and avoid interference. In particular, we would like to color the n nodes of the network so that no two neighbors are awake at the same time (thereby improving coverage efficiency and lowering interference to a centralized receiver). This can be mapped to problem of allocating intervals on a circle of circumference T (the cycle time) such that two neighbors in the graph do not get overlapping intervals. Nodes with low degree should get longer intervals. The constraint in this assignment problem is that the solution must be decentralized and require minimal internal state and external inputs. They show an algorithm which uses one bit of local state to use O(d_{\max}) colors in O(\log n) time, and an algorithm with one bit of local state and one bit of external side information to color the graph with d_{\max} + 1 colors in d_{\max} \log n rounds.

Using Three States for Binary Consensus on Complete Graphs
Etienne Perron (EPFL, Switzerland); Dinkar Vasudevan (EPFL, Switzerland); Milan Vojnovic (Microsoft Research, UK)
This paper looked at a binary consensus problem that was seemed on the face of it quite different from the consensus algorithms that I work on. They consider a complete graph where every node has a an initial value that is either 0 or 1. The goal is to compute the majority vote in all the nodes in a distributed manner while requiring only a small state in each of the nodes. This is related to a standard voter model, in which nodes sample their neighbor and switch their vote if it is different. If we have 3 states, (0, e, 1), where e is “undecided” then we can make a rule where a node samples its neighbor and moves one step in direction reported by the neighbor. Different variants of the algorithm are also analyzed, and the error rate analysis seemed a bit hairy, involving ODEs and so on. What I was wondering at the end is the relationship of this algorithm to standard gossip with one bit quantization of messages.

prove as you go or scaffold first?

I had an interesting conversation two weeks ago about the working process for doing theory work in CS and EE. We discussed two extremes of working styles. In one, you meticulously prove small statements, type them up as you go along, getting the epsilons and deltas right and not working on the next step until the current step is totally set. I call this “prove as you go.” The other is that you sketch out some proofs to convince yourself that they are probably true (in some form) and then try to chase down the implications until you have the big result. When some deadline rolls around, you then build up the proofs for real. This could be thought of as “scaffolding first.” Fundamentally, these are internal modes of working, but because of the pressure to publish in CS and EE they end up influencing how people view theory work.

Continue reading

Csiszár and Körner — bring it back in print!

Via Timothy Chow blogging at Terence Tao’s blog, I learned about outofprintmath, a kind of survey site for people to express their desire to have out of print math books brought back into print. One of the classic texts of information theory by Csiszár and Körner has been out of print for a while. Many grad students have been frustrated its unavailability (even the library copy is often checked out nearly permanently). There are (illegal) ways around this, but a photocopied book is just not the same as a real one you can take to a cafe and read.

Bobak (I think) has added Csiszár and Körner to outofprintmath. It’s number 51. Go vote!

ITA 2008 : Part the Second

This is the second part in my blogging about some of the talks I saw at ITA 2008, and also a way for me to test a tag so that readers who don’t care about information theory won’t have a huge post hogging up their aggregator window (livejournal, I’m looking at you). Again, I may have misunderstood something in the talks, so if you see something that doesn’t sound right, let me know.

Continue reading

ITA 2008 : Part the First

I’ve spent this week at ITA 2008 at UCSD. There have been lots of great talks, but my brain is fairly saturated, so my ability to take more than cursory notes decreased as the workshop progressed. I’ve met a number of people who apparently read this blog (hi!), which is nice. One thing I’ll try to do is put in some more links to things as I think of them and also to raise some (possibly trivial) questions that came up as I wrote about the papers.

Iterative coding for network coding
Andrea Montanari, Stanford, Vishwambar Rathi, EPFL, and Rudiger Urbanke, EPFL

I think of this paper as the Shannon-theory version of the Koetter-Kschischang coding-theory analysis of network coding. There is a preprint of some of the results on ArXiV that I read a little while ago (and unsuccessfully presented in our group meeting). The channel model is a rank metric channel (see Gabidulin), and a notion of distance between subspaces can be defined. The idea is the following — messages are encoded into subspaces of a vector space over a finite field (taken to be F2 here). The set of all subspaces is the projective space, and the set of all subspaces of dimension k is the Grassmann manifold. So we can look at a codebook as a set of binary matrices of dimension m x l. In this model, headers are preserved, so the decoder in a network code knows the transfer matrix of the channel exactly. However, errors can still be introduced in the packet payload in the following sense. Each column can be written as

yj = xj + zj

where zj is uniformly chosen from an unknown subspace of limited rank. This is a perfectly good DMC channel model, and one can calculate its capacity. The interesting thing is that then you can define an LDPC code (with all variable nodes of degree 2 and a non-soliton distribution for the check nodes) across the columns of the codeword (which is a matrix, remember), with matrix-valued weights on the edges of the Tanner graph. The messages that are propagated in the BP algorithm correspond to information about affine subspace memberships of the rows. This code (a) achieves capacity on the rank metric channel, and the error for erasures also goes to 0. Since I have been concerned about jamming and interference and malicious attacks, the assumption on the header being preserved feels like a limitation of the approach, but I think that perhaps concatenating codes may also be able to preserve something about the header information. The reason I call it sort of Shannon theoretic is that the error model is random and the density evolution arguments are a bit like the probabilistic method.

On rate-distortion function of Poisson processes with a queueing distortion measure
Todd P. Coleman, UIUC, and Negar Kiyavash, UIUC, and Vijay G. Subramanian, Hamilton Institute, NUIM

The main idea here is find a rate-distortion problem corresponding to the bits through queues result on timing channels. So given a set of Poisson arrivals {Xi}, how can we define a distortion measure and rate-distortion function that makes sense? The choice they make is to requires that the reconstruction be causal, so that the packet times in the reconstructed version are delayed w.r.t. the originals and the objective then is to minimize the induced service time of the reconstruction. This makes the test channel into a first-come first-serve queue and in particular the optimal test channel is the exponential timing channel. The rate distortion function turns out to be -a log(D) where a is the intensity of the Poisson process. The benefit of their definition is that the proofs look as clean as the AWGN and BSC rate distortion proofs.

Network tomography via network coding
G. Sharma, IIT Bombay, S. Jaggi, CUHK, and B. K. Dey IIT Bombay

Network tomography means using input-output probing of a network to infer internal structural properties. In their model, all nodes have unique IDs and share some common randomness. Inferring the presence of absence of a single edge in the graph is relatively easy, and then the process can be repeated for every edge. The more interesting problem is structural inference in an adversarial setting. They show that the graph can be recovered if the min-cut from the source to destination is greater than twice the min-cut from the adversary to the destination (I think), which squares with the Byzantine Generals problem. The tomography problem is interesting, and I wonder if one can also embed adversary-detection methods as well to create a protocol to quarantine misbehaving internal nodes.

Coding theory in projective spaces
Tuvi Etzion, Technion, and Alexander Vardy, UCSD

This talk also dealt with the rank-metric channel. Previous work of Koetter and Kscischang had found analogues to the Singleton and Johnson bounds, and here they find Delsarte, Iterated Johnson, and packing bounds, as well as a “linear programming bound.” They show there are no nontrivial perfect codes in the projective space of dimension n over Fq, and define the notion of a quasilinear code, which is an abelian group in the projective space. A code is quasilinear iff the size is a power of two. Honestly, I found myself a bit lost, since I’m not really a coding theorist, but it seems that the rank metric channel has lots of interesting properties, some of which are old, and some of which are new, and so exciting new algebraic codes can be constructed for them.

On capacity scaling in arbitrary wireless networks
Urs Niesen, MIT, Piyush Gupta, Bell Labs, and Devavrat Shah, MIT

The goal for this paper was to relax some of the random-placement assumptions made in a lot of cooperative schemes for large wireless networks. They look at n nodes scattered in [0, n1/2]2 with a standard path loss model for the gains, full CSI, and equal rate demands for the each source-destination pair. There are three schemes we can consider — multihop, hierarchical cooperation, and a new scheme they have here, cooperative multihop. For large path loss exponent the multihop strategy is optimal, and in other regimes hierarchical cooperation is optimal. The cooperative multihop scheme works by dividing the network into squarelets and then doing multihop on the squarelet-level overlay graph. This strategy is optimal in some cases, and has the benefit of not relying on the node placement model, unlike the hierarchical cooperation scheme. These network models are interesting, but the full CSI assumption seems pretty darn restrictive. I wonder if the cooperative multihop would be more attractive also because channel estimation and ranging need only be done on a local level.

Percolation processes and wireless network resilience
Zhenning Kong and Edmund M. Yeh, Yale

Consider a random geometric graph with density chosen above the percolation threshold. This is an often (ab?)used model for wireless sensor networks. First off, throwing n points in the unit square converges, in distribution to the percolation model with an infinite Poisson process in the plane and the Gilbert disc model. If we make nodes “fail” by flipping a coin for each node, this is just thinning the process, so it’s easy to find a threshold on the failure probability in terms of the percolation threshold that guarantees a giant connected component will still exist. So far so good, but what happens if the failure probability depends on the degree of the node? High degree nodes may serve more traffic and hence will fail sooner, so this is a good model. This work finds necessary and sufficient conditions on the failure probability distribution p(k) for a giant connected component to still exist. Here k is the node degree. The necessary and sufficient conditions don’t match, but they are interesting. In particular, the sufficient condition is very clean. One application of these results is to a cascading failures model, where you want to stop one node going down from taking down the whole network (c.f. power failures in August 1996). It would be interesting to see if a real load redistribution could be integrated into this — nodes have weights (load) on them, proportional to degree, say, and then dynamically nodes are killed and push their load on to their ex-neighbors. How fast does the whole network collapse? Or perhaps a sleeping-waking model where a node dumps its traffic on neighbors while sleeping and then picks up the slack when it wakes up again…