paper a day : approximating high-dimensional pdfs by low-dimensional ones

Asymptotically Optimal Approximation of Multidimensional pdf’s by Lower Dimensional pdf’s
Steven Kay
IEEE Transactions on Signal Processing, V. 55 No. 2, Feb. 2007, p. 725–729

The title kind of says it all. The main idea is that if you have a sufficient statistic, then you can create the true probability density function (pdf) of the data from the pdf of the sufficient statistic. However, if there is no sufficient statistic, you’re out of luck, and you’d like to create a low-dimensional pdf that somehow best captures the features you want from the data. This paper proves that a certain pdf created by a projection operation is optimal in that it minimizes the Kullback-Leibler (KL) divergence. Since the KL divergence dictates the error in many hypothesis tests, this projection operation is good in that decisions based on the projected pdf will be close to decisions based on the true pdf.

This is a correspondence item, so it’s short and sweet — equations are given for the projection and it is proved to minimize the KL divergence to the true distribution. Examples are given for cases in which sufficient statistics exist and do not exist, and an application to feature selection for discrimination is given. The benefit is that this theorem provides a way of choosing a “good” feature set based on the KL divergence, even when the true pdf is not known. This is done by estimating an expectation from the observed data (the performance then depends on the convergence speed of the empirical mean to the true mean, which should be exponentially fast in the number of data points).

The formulas are sometimes messy, but it looks like it could be a useful technique. I have this niggling feeling that a “bigger picture” view would be forthcoming from looking at information geometry/differential geometry viewpoint, but my fluency in those techniques is lacking at the moment.

Update: My laziness prevented me from putting up the link. Thanks, Cosma, for keeping me honest!

Acting Like a Thief

I finally watched this documentary that was sent to me a few weeks ago called Acting Like a Thief. It is about the a street theatre organization in India from the Chhara community, a group that was labeled by the British as a “criminal tribe.” The disctrimination continues to this day. This is what taking community action via theater is about.

Also related, the Human Rights Watch report on discrimination against the Dalit community in India.

paper a day : last encounter routing with EASE

Locating Mobile Nodes with EASE : Learning Efficient Routes from Encounter Histories Alone
M. Grossglauser and M. Vetterli
IEEE/ACM Transactions on Networking, 14(3), p. 457- 469

This paper deals with last encounter routing (LER), a kind of protocol in which location information for nodes is essentially computed “on the fly” and there is no need to disseminate a location table and update it. They consider a grid toplogy (a torus, actually) on which nodes do a simple random walk. The walk time is much slower than the transmission time, so at any moment the topology is frozen with respect to a single packet transmission. Every node i maintains a table of pairs (Pij, Aij) for each other node j where P is its last known position and A is the age of that information. In the Exponential Age SEarch protocol (EASE) and GReedy EASE (GREASE), a packet keeps in its header an estimate of the position of its destination, and rewrites that information when it meets a node with a closer and more recent estimate. Because the location information and mobility processes are local, these schemes performs order-optimally (but with worse constants) to routing with location information, and with no overhead in the cost.

In particular, EASE computes a sequence of anchors for the packet, each one of has an exponentially closer estimate of the destination in both time and space. For example, each anchor could halve the distance to the destination as well as the age of that location estimate. This is similar in spirit to Kleinberg’s small world graphs paper, in which routing in a small world graph halves the distance, leading to a log(n) routing time, which comes from long hops counting the same as local hops. Here long hops cost more, so you still need n1/2 hops. The paper comes with extensive simulation results to back up the analysis, which is nice.

What is nicer is the intuition given at the end:

Intuitively, mobility diffusion exploits three salient features of the node mobility processes: locality, mixing, and homogeneity…
Locality ensure[s] that aged information is still useful… Mixing of node trajectories ensures that position information diffuses around this destination node… Homogeneity ensure[s] that the location information spreads at least as fast as the destination moves.

The upshot is that location information is essentially free in an order sense, since the mobility process has enough memory to guarantee that suboptimal greedy decisions are not too suboptimal.

Reads 2007 No. 1

I used to write a lot about each book I read but of course I don’t have the time. And besides, I haven’t been reading as quickly as I used to. Unfortunately my dream of more theory/nonfiction hasn’t come to fruition. So besides The New Yorker and Harpers I’ve delved into some more reads:

The Thursday Next Novels : The Eyre Affair, Lost In a Good Book, The Well of Lost Plots, Something Rotten (Jasper Fforde) — a bit of brain candy, these, but a lovely bit of literary comedy. It’s like Terry Pratchett with an ear for the classics (which I haven’t really read, truth be told). The books follow a Special Operations agent named Thursday Next, who lives in a world in which the line between fiction and reality is permeable, and in which there is a special division for literary crimes. What fun! Actually, the best part about it is that it actually makes me want to go back and read Great Expectations and the like. Even Jane Austen seems like it could be fun after these books, despite my falling asleep the last time I tried to read her. Maybe I have become more sensitive over time…

The Big Over-Easy (Jasper Fforde) — A spin-off series from the above, with a similar sensibility. Who wouldn’t like a murder mystery set in a world in which nursery rhyme characters populate the town? It’s more like Who Framed Roger Rabbit? than The Thin Man, so noir fans beware.

Speaking of Siva (AK Ramanujan)) — This was a slim volume of Bhakti-movement poems (vacanas, or utterances) from the Virasaiva community and were translated from the original Kannada. The poems themselves are quite beautiful — like most Bhakti poems, they get at the heart of what love and God and the self are in a relatively un-self-conscious way. One of the poems, by Allama Prabhu, is used in A Flowering Tree, the new John Adams opera that I am singing as part of a semi-staged SF Symphony concert at the beginning of next month. As one of the few, if only, South Asian singers in that concert, I feel a particular need to educate myself about the textual underpinnings. The poem was translated into Spanish before being set into music, and I’m not sure the music is appropriate to the relgious/philosophical outlook of the poem. Adams is free to set the poem as he likes, and he isn’t trying to don some mantle of authenticity, so I find his musical choices interesting, but I think the text serves his end, rather than the reverse. Perhaps I will write more on that later.

I’m currently working on a number of books in parallel. More when I actually finish some of them.

a woman whose body said you’ve had your last burrito for a while

The Bulwer-Lytton winners have been announced. The winner?

Detective Bart Lasiter was in his office studying the light from his one small window falling on his super burrito when the door swung open to reveal a woman whose body said you’ve had your last burrito for a while, whose face said angels did exist, and whose eyes said she could make you dig your own grave and lick the shovel clean.
Jim Guigli
Carmichael, CA

I have no idea what kind of body would say that to me, but it would take more than a beautiful woman to stop me from eating burritos.

ITA Workshop : Panel on publication issues

Part of ITA was a panel on publication issues in information theory. Paul Siegel was the moderator and led off with three binaries to spark discussion “paper versus plasma” (the medium of publication), “prophets versus profits” (the financial model), and “peer review versus page rank” (quality measurements). Pretty much everyone thought page rank was not really a measure of anything except… page rank, so peer review was not under attack (thank goodness). In general people were in favor of paper at least for archival purposes, so that the ability to browse would still be there. Finally, everyone said they liked the IEEE (no surprise there) and its publication model.

Dave Forney talked about the spiraling costs of Elsevier-owned journals to libraries and urged people to just say no. He implicated those professors who choose to be on the editorial boards of such journals as being part of the problem, and urged them to divest from Elsevier, as it were. In general, he wanted faculty to be more proactive and aware of these important issues, a stance that I was 100% with. He then turned to ArXiV, and told everyone to submit their preprints to ArXiV so that people could know what research is being done. He said usage was increasing, but too slowly.

Andrea Goldsmith said that she found ArXiV to be of limited use since articles posted there are not peer reviewed, and the value of an article is only guaranteed via publication in the Transactions. For the publication model, she stressed the importance of access to the Transactions for the entire IEEE, so that the IT Society should not drift off. She also urged faculty to put institutional pressure on Elsevier by boycotting.

Steve McLaughlin also brought up the ties that bind IEEE and IT. The online Transactions are a major source of revenue, and it was the IT Society that spurred the creation of IEEExplore. He lauded ArXiV as a good impetus to embrace new ideas and models for publication, and floated the idea of an Open Access (OA) journal to complement the Transactions.

Dave Neuhoff reiterated that the journal should be the focus of the field, and that conference papers are not archival for information theorists. Because of this and other reasons, the IT Society was able to convince the IEEE to grant online access to conference proceedings.

Vince Poor, the current Editor-In-Chief, talked about copyright issues in the age of ArXiV and pointed out how reasonable the IEEE is. He seemed to indicate that Elsevier doesn’t affect our community much, but I didn’t really follow his argument there. He also claimed that market forces will push the publication industry to embrace electronic formats.

Rüdiger Urbanke was very excited about ArXiV because it could provide timestamps, and since the field is moving faster these timestamps are important. He also questioned the 5 page ISIT paper, which is not reviewed that carefully, and said that if there is a 5 page limit on correspondences the scale doesn’t make sense, especially in light of conference papers being non-archival. Finally, the pressure to publish is what enables Elsevier, and so this pressure must be alleviated somehow.

In the Q&A, one person asked about double-blind reviewing, which the panel wholeheartedly embraced. I think they should do it to, and I really have no idea what is holding it up, except that perhaps Pareja, the online paper management system, has to be hacked to do it. Someone else asked why we need timestamps from ArXiV when there are timestamps on the IT Transactions papers already, but Urbanke said that it has to do with time scales more than anything, and ArXiV lets you track revisions. Another person complained that ArXiV could become a repository for erroneous results and rejected papers, but Forney was quick to note that ArXiV’s value lies in showing who is working on what, and clearly there are no guarantees on the veracity of the claims made there. The last question was on the nature of journal papers versus conference papers — if conference papers are not archival, does that make it ok to merge 3 conference papers to make a journal paper? The panel seemed surprised to hear that this could be considered double-publishing, and the informal consensus seemed to be that doing so was not self-plagiarism.

I was most disappointed that nobody took up Steve McLaughlin’s comment on making an OA journal that is peer-reviewed and pay-to-publish. I’ve already written about having a new letters journal, but an OA journal would provide an alternative place to publish papers that is not evil and possibly has faster review turnaround than the IT Transactions. Given there are 900 papers submitted a year to the IT Transactions now, it seems like the benefits would be great. It would also help alleviate the Elsevier-feeding publication pressure. But the IT Society could never endorse such a project and thus a panel like this would not address that issue. You’d have to get professors without their official IEEE hats on to discuss this freely, and that wasn’t going to happen at this panel. I think if the OA option is on the table it could get modified into something more palatable and friendly to the iEEE, but it of course would take some discussion and a desire to make it happen.

ITA Workshop : source and channel coding

There were some talks on dynamic spectrum access as well as waterfilling and other resource allocation problems.

  • Twice-universal types and simulation of individual sequences (Alvaro Martín, Universidad de la República, Neri Merhav, Technion, Gadiel Seroussi, Universidad de la República, and Marcelo J. Weinberger, HP Labs)

    A universal source code partitions the set of all length-n sequences so that under an iid model class into types, so that two sequences that have the same type have the same probability. If we look at order-k Markov sources, we can ask for a universal code that is universal for all n and k. This leads to twice-universal source codes. Given a Markov-order esimator that takes a sequence and estimates an order k, we can intersect the set of sequences with the same order with the type class of order k. Given some conditions on the quality of the order estimator, this partition has some nice properties, including simulation of individual sequences with similar empirical statistics.

  • Universal noiseless compression for noisy data (Gil I. Shamir, University of Utah, Tjalling J. Tjalkens, Frans M. J. Willems, Eindhoven University of Technology)

    We’d like to be able to compress noisy data, since sometimes we only have access to noisy data. What they propose is a new probability estimator that can take into account upper and lower bounds on the probability to be estimated. So if I know the true probability lies in [a,b] I can use that knowledge in the estimator. When a = 0, b = 1 this reduces to the Krichevsky-Trofimov (KT) estimator. Intuitively, noise reduces the richness of the model class, and hence reduces the redundancy. Some properties of the esimator and corresponding compression results were given.

  • On universal coding of unordered data (Lav R. Varshney and Vivek K Goyal, MIT)

    Suppose you want to compress data but don’t care about the order of the data (think database records for concreteness) — then using a source code for vectors is silly, since you want a source code for multisets. In this work they address universal coding for multisets , and show that universal lossless coding is impossible, but a low universal rate is achievable across the model class (iid sources I believe, but it might be more general).

  • Sum-Capacity of a Degraded Gaussian Multiaccess Relay Channel (Lalitha Sankar, WINLAB, Rutgers, Gerhard Kramer, Bell Labs, and Narayan B. Mandayam, WINLAB, Rutgers)

    For a physically degraded multiple-access relay channel (MARC), the decode-forward inner bound and old outer bounds do not match in general. But by making new outer bounds that exploit the causality of the relay, an optimized decode-forward scheme meets the outer bound.

  • Secrecy generation for channel models (Imre Csiszar, Renyi Institute, Budapest, and Prakash Narayan, Univ. of Maryland, College Park, USA)

    I am not as familiar with the 2004 paper on which this work builds (but I’m reading it now). The model is that an encoder transmits a vector into a DMC with many outputs. Other terminals each view one output and then can engage in public discussion that is overheard by an eavesdropper. The terminals must come up with a secret key that is secret from the eavesdropper. The results are related to a multerminal source problem that was discussed earlier.

ITA Workshop : networks

There were some talks on dynamic spectrum access as well as waterfilling and other resource allocation problems.

  • Robust routing for dynamic ad hoc wireless networks based on embeddings (D. Tschopp, EPFL, S.N. Diggavi, EPFL and M. Grossglauser, EPFL)

    In fading, multiple unicast networks with mobility and point-to-point links, how can we do route discovery withi limited control overhead? This is a question of topology versus geometry and so they use a graph embedding of the communication graph (taking fading into account) into Rn to minimize the worst-case stretch. By using beacons to coordinate partial graph distances they can decentralize the approach a bit.

  • Coding achieves the optimal delay-capacity trade-off in mobile ad hoc networks (Lei Ying, Sichao Yang and R. Srikant, University of Illinois at Urbana-Champaign)

    For networks in which packets have a hard delay deadline, it’s bad news if a packet just dies onces its deadline expires — you might hope to code over packets so that if a packet dies from delay its contents can be recovered from other packets that made it in time (writing this makes it sound like Watership Down). This work tries to look at a mobility model in which nodes choose a fresh new location every time slot and uses coding across packets to help maximize a link rate.

  • On the throughput-region of a random wireless ad hoc network with geographic routing (Sundar Subramanian, Sanjay Shakkottai, University of Texas at Austin, Piyush Gupta, Bell Labs)

    Geographic routing is bad for network congestion and doesn’t deal well with holes in the networks (although there are hacks for this). By splitting a route into multiple paths, the authors can bound away the congestions. To deal with holes, you can erasure-code across the routes and drop a packet if you hit a hole. These two algorithms are shown to be order-optimal up to log factors in the network size.

  • On the wireless communication architecture for consensus problems (Anna Scaglione, Cornell)

    This work tried to incorporate quantization error into gossip algorithms as well as other physical-layer considerations. I wasn’t able to parse out the results (the talk was notation-heavy), but I suppose I will get to see the paper soon.

  • The law second class customers obey (James Martin, Oxford University, Balaji Prabhakar, Stanford University)

    The best thing about this talk was that it was given on a whiteboard (no slides) and was one of the clearest talks I went to at the whole workshop. Maybe technology is getting in the way. Suppose that we have an exponential service time queue with two Poisson arrival processes (first and second class customers). The first class customers queue up as if the second class customers were not there. Clearly the departure process for the first class customers is also Poisson. What about the second class customers? They use Weber’s result on the
    interchangeability of exponential-server queues to get an alternate derivation of the departure law.

  • On the broadcast capacity of wireless networks (Birsen Sirkeci-Mergen and Michael Gastpar, UC Berkeley)

    Suppose we have a central broadcaster in a network that wants to flood one message across all the nodes. This paper addresses how do to this for dense and extended networks, under iid and “spatially continuous” fading. For dense networks a two-phase protocol involving cooperation works. For an extended network, a cooperative multistage broadcase protocol is proposed. The key is that multihop protocols perform very badly on the broadcast problem and cooperative protocols are needed to get reasonable results.