Election auditing with convex optimization

A month or two ago I was on the shuttle going to campus and ended up talking with Steve Checkoway about his work. He described a problem in election auditing that sounded pretty interesting : given an election between, say, two candidates, and the ability to sample individual ballots at random and compare how they were counted versus how they should have been counted, how should we design an auditing algorithm that has the following property: if the auditor certifies that the reported winner is the true winner, then it is wrong with probability no larger than \alpha?

After a couple more meetings we came up with a pretty simple method based on minimizing a Kullback-Leibler divergence. This minimum KL-divergence gives a bound on the error probability of our algorithm which we can then compare to \alpha to give the guarantee. To do the minimization we need to turn to convex optimization tools (which might be a subject for another post — numerically minimizing the KL divergence requires choosing your solving algorithm carefully).

We wrote up a paper on it with Steve’s advisor Hovav Shacham and submitted it to 2010 Electronic Voting Technology Workshop, where it was accepted. We’re making some major-ish revisions (mostly in the simulations) but I’m pretty excited about it. There are some fun analysis questions lurking in the corners that we didn’t have time to get to yet, but I’m looking forward to poking at them after ISIT.

Speaking of which, ISIT is coming up — not sure how much I will blog, but there will be something at least.

CTW 2010

Playa Norte, Isla Mujeres

Playa Norte on Isla Mujeres

I was lucky enough to get invited to present at the Communications Theory Workshop (CTW) in Cancun last month. Since the conference started on a Monday I took the weekend before to do a little sightseeing, hang out on the beach (see above), and visit Chichen Itza, which was pretty amazing. CTW is a small conference — single track, 2.5 days, and plenty of time for panels, discussions, etc. I had a great time there, and it was really nice to meet new people who were working on interesting problems that I hadn’t even heard about.

The first day was focused on wireless data services, the explosion thereof, and the lack of capacity to support it all. Add more relays, femtocells, interference management and alignment, putting more antennas for MIMO gains. Reinaldo Valenzuela was pretty pessimistic — according to him we are pretty much at the point where service providers are losing money by providing data over wireless. He seemed to think the only way out now is to change pricing models or really try to exploit network MIMO.

The second day was on cognitive radio, where the picture also seemed a bit pessimistic. Michael Marcus kind of laid into academics for not participating in the FCC comment process. The criticism is valid — if we care about cognitive radio and its potentials and so on, we should try to persuade the regulators to do smart things and not piecemeal things. As things stand now, it’s unclear how much additional capacity is available in the TV bands, given the current requirements for operating there. The sensing requirements nigh unreasonable, and if you do detect a primary you have to stay silent for 30 minutes. That makes it kind of hard to run a commercial service.

Another thing I learned during that day was that wireless microphones in the US operate in the TV bands, and are grandfathered in to the existing rules for cognitive radio. This means you have to detect if you will interfere with a wireless mic, which seems quite tough. Steve Shellhammer talked about a contest Qualcomm ran to build a detector for this problem. In addition, a large fraction of wireless microphone users are not licensed and hence their use is illegal, but nobody’s enforcing that.

The second half of the day was on emerging topics, followed by a panel called “Is Communication Theory Dead?” I think the pessimism and despair of the first two days needed to be let out a little, and there was a pretty spirited discussion of whether communication theorists were answering the right questions or not. It was like group therapy or something. Luckily there was the banquet after the panel so people could forget their woes…

The last day of the workshop was the session I spoke in, which were 5 talks on wireless networks, but from vastly different perspectives. Tara Javidi talked about routing, Daniela Tuninetti about information theory and cognitive interference channels, Elza Erkip about game theory models for the interference channel, and Anna Scaglione about cooperative information flooding. For my part, I talked about (very) recent work I have done with Tara on distributed consensus with quantized messages. We have some nice first-order results but I’m still hammering away at better bounds on the convergence rate.

Allerton 2009 : quick takes

I was at the Allerton conference last week for a quick visit home — I presented a paper I wrote with Can Aysal and Alex Dimakis on analyzing broadcast-based consensus algorithms when channels vary over time. The basic intuition is that if you can get some long-range connections “enough of the time,” then you’ll get a noticeable speed up in reaching a consensus value, but with high connectivity the variance of the consensus value from the true average increases. It was a fun chance to try out some new figures that I made in OmniGraffle.

The plenary was given by Abbas El Gamal on a set of new lecture notes on network information theory that he has been developing with Young-Han Kim. He went through the organization of the material and tried to show how much of the treatment could be developed using robust typicality (see Orlitsky-Roche) and a few basic packing and covering lemmas. This in turn simplifies many of the proofs structurally and can give a unified view. Since I’ve gotten an inside peek at the notes, I can say they’re pretty good and clear, but definitely dense going at times. They are one way to answer the question of “well, what do we teach after Cover and Thomas.” They’re self-contained and compact .

What struck me is that these are really notes on network information theory and not on advanced information theory, which is the special topics class I took at Berkeley. It might be nice to have some of the “advanced but non-network” material in a little set of notes. Maybe the format of Körner’s Fourier Analysis book would work : it could cover things like strong converses, delay issues, universality, wringing lemmas, AVCs, and so on. Almost like a wiki of little technical details and so on that could serve as a reference, rather than a class. The market’s pretty small though…

I felt a bit zombified during much of the conference, so I didn’t take particularly detailed notes, but here were a few talks that I found interesting.

  • Implementing Utility-Optimal CSMA (Lee, Jinsung (KAIST), Lee, Junhee (KAIST), Yi, Yung (KAIST), Chong, Song (KAIST), Proutiere, Alexandre (Microsoft Research), Chiang, Mung (Princeton University)) — There are lots of different models and algorithms for distributed scheduling in interference-limited networks. Many of these protocols involve message passing and the overhead from the messages may become heavy. This paper looked at how to use the implicit feedback from CSMA. They analyze a simple two-link system with two parameters (access and hold) and then use what I though was a “effective queue size” scheduling method. Some of the analysis was pretty tricky, using stochastic approximation tools. There were also extensive simulation results from a real deployment.
  • LP Decoding Meets LP Decoding: A Connection between Channel Coding and Compressed Sensing (Dimakis, Alexandros G. (USC), Vontobel, Pascal O. (Hewlett-Packard Laboratories)) — This paper proved some connections between Linear Programming (LP) decoding of LDPC codes and goodness of compressed sensing matrices. A simple example : if a parity check matrix is good for LP decoding then it is also good for compressed sensing. In particular, if it can correct k errors it can detect k-sparse signals, and if it can correct a specific error pattern \vec{e} it can detect all signals whose support is the same as \vec{e}. There were many other extensions of this results as well, to other performance metrics, and also the Gaussian setting.
  • The Compound Capacity of Polar Codes (Hassani, Hamed (EPFL), Korada, Satish Babu (EPFL), Urbanke, Ruediger (EPFL)) — Rudi gave an overview of polar codes from two different perspectives and then showed that in general polar codes do not achieve the compound channel capacity, but it’s not clear if the problem is with the code or with the decoding algorithm (so that’s still an open question).
  • Source and Channel Simulation Using Arbitrary Randomness (Altug, Yucel (Cornell University), Wagner, Aaron (Cornell University)) — The basic source simulation problem is to simulate an arbitrary source Y_1^n with an iid process (say equiprobable coin flips) X_1^n. Using the information spectrum method, non-matching necessary and sufficient conditions can be found for when this can be done. These are in terms of sup-entropy rates and so on. Essentially though, it’s a theory built on the limits or support of the spectra of the two processes X and Y. If the entropy spectrum in X dominates that of Y then you’re in gravy. This paper proposed a more refined notion of dominance which looks a bit like majorization of some sort. I think they showed that was sufficient, but maybe it’s also necessary too. Things got a bit rushed towards the end.
  • Channels That Die (Varshney, Lav R. (MIT), Mitter, Sanjoy K. (MIT), Goyal, Vivek K (MIT)) — This paper looked at a channel which has two states, alive (a) and dead (d), and at some random time during transmission, the channel will switch from alive to dead. For example, it might switch according to a Markov chain. Once dead, it stays dead. If you want to transmit over this channel in a Shannon-reliable sense you’re out of luck, but what you can do is try to maximize the expected number of bits you get through before the channel dies. I think they call this the “volume.” So you try to send a few bits at a time (say b_1, b_2, \ldots b_k are the number of bits in each chunk). How do you size the chunks to maximize the volume? This can be solved with dynamic programming. There are still several open questions left, but the whole construction relies on using “optimal codes of finite blocklength” which also needs to be solved. Lav has some interesting ideas along these lines that he told me about when I was in Boston two weeks ago…
  • The Feedback Capacity Region of El Gamal-Costa Deterministic Interference Channel (Suh, Changho (UC Berkeley), Tse, David (UC Berkeley)) — This paper found a single letter expression for the capacity region, which looks like the Han-Kobayashi achievable region minus the two weird 2 R_i + R_j constraints. Achievability uses Han-Kobayashi in a block-Markov encoding with backward decoding, and the converse uses the El Gamal-Costa argument with some additional Markov properties. For the “bit-level” deterministic channel model, there is an interpretation of the feedback as filling a “hole” in the interference graph.
  • Upper Bounds to Error Probability with Feedback (Nakiboglu, Baris (MIT), Zheng, Lizhong (MIT)) — This is a follow-on to their ISIT paper, which uses a kind of “tilted posterior matching” (a phrase which means a lot to people who really know feedback and error exponents in and out) in the feedback scheme. Essentially there’s a tradeoff between getting close to decoding quickly and minimizing the error probability, and so if you change the tilting parameter in Baris’ scheme you can do a little better. He analyzes a two phase scheme (more phases would seem to be onerous).
  • Infinite-Message Distributed Source Coding for Two-Terminal Interactive Computing (Ma, Nan (Boston University), Ishwar, Prakash (Boston University)) — This looked at the interactive function computation problem, in which Alice has X^n, Bob has Y^n and they want to compute some functions \{f_A(X_i,Y_i) : i \le n\} and \{f_B(X_i,Y_i) : i \le n\}. The pairs (X_i, Y_i) are iid and jointly distributed according to some distribution p_{XY}. As an example, (X_i,Y_i) could be correlated bits and f_A and f_B could be the AND function. In previous work the authors characterized the the rate region (allowing vanishing probability of error) for t rounds of communication, and here they look at the case when there are infinitely many back-and-forth messages. The key insight is to characterize the sum-rate needed as a function of p_{XY} — that is, to look at the function R : \mathcal{P}_{XY} \to \mathbb{R} and look at that surface’s analytic properties. In particular, they show it’s the minimum element of a class \mathcal{F} of functions which have some convexity properties. There is a preprint on ArXiV.

Posting has been light

Due to extensive traveling around. At the end of August and beginning of September I attended a workshop at the American Institute of Mathematics on Permanents and modeling probability distributions. It was a lot of fun, and a lot different than any workshop I’d been to before. There were only 20 or so participants and every day we’d break out in smaller groups to actually work on some sub-problem or area. By the end of the week I felt like I’d had a crash course in large-alphabet probability estimation and also got a sense of the huge range of problems and techniques (from exchangeable random partition processes to Markov chain Monte Carlo) involved in tackling them.

Last week I gave a talk on gossip algorithms at UC Davis and got a chance to talk with a lot of people there. It was a great trip, except for the 6:30 flight time out of San Diego. Then last weekend there were heirloom tomatoes:

Heirloom Tomato with bread and Cowgirl Creamery Red Tail cheese

Heirloom Tomato with bread and Cowgirl Creamery Red Tail cheese


And also peach pie:
Peach pie...

Peach pie...


Mmmm, pie.

Illinois Wireless Summer School

I just came back from the Illinois Wireless Summer School, hosted by the Illinois Center for Wireless Systems. Admittedly, I had a bit of an ulterior motive in going, since it meant a trip home to see my parents (long overdue!), but I found the workshop a pretty valuable crash course running the whole breadth of wireless technology. The week started out with lectures on propagation, wireless channel modeling, and antennas, and ran up to a description of WiMAX and LTE. Slides for some of the lectures are available online.

Some tidbits and notes:

  • Venugopal Veeravalli gave a nice overview of channel modeling, which was a nice refresher since taking David Tse’s wireless course at the beginning of grad school. Xinzhou Wu talked about modulation issues and mentioned that for users near the edge of a cell, universal reuse may be bad and mentioned Flarion’s flexband idea, which I hadn’t heard about before.
  • Jennifer Bernhard talked about antenna design, which I had only the sketchiest introduction to 10 years ago. She pointed out that actually getting independent measurements from two antennas by spacing them just the right distance apart is nearly impossible, so coupling effects should be worked into MIMO models (at least, this is what I got out of it). Also, the placement of the antenna on your laptop matters a lot — my Mac is lousy at finding the WiFi because its antenna is sub-optimally positioned.
  • Nitin Vaidya discussed Dynamic Source Routing, which I had heard about but never really learned before.
  • Dina Katabi and Sachin Katti talked about network coding and its implementation. The issues with asynchronous communication for channel estimation in the analog network coding was something I had missed in earlier encounters with their work.
  • P. R. Kumar talked about his work with I-Hong Hou and Vivek Borkar on QoS guarantees in in a simple downlink model. I had seen this talk at Infocom, but the longer version had more details and was longer, so I think I understood it more this time.
  • Wade Trappe and Yih-Chun Hu talked about a ton of security problems (so many that I got a bit lost, but luckily I have the slides). In particular, they talked about how many adversarial assumptions that are made are very unrealistic for wireless, since adversaries can eavesdrop and jam, spoof users, and so on. They mentioned the Dolev-Yao threat model, from FOCS 1981, that I should probably read more about. There were some slides on intrusion detection, which I think is an interesting problem that could also be looked at from the EE/physical layer side.
  • R. Srikant and Attila Eryilmaz gave a nice (but dense) introduction to resource allocation and network utilization problems from the optimization standpoint. Srikant showed how some of the results Kumar talked about can also be interpreted from this approach. There was also a little bit of MCMC that showed up, which got me thinking about some other research problems…
  • The industry speakers didn’t post their slides, but they had a different (and a bit less tutorial) perspective to give. Victor Bahl from MSR gave a talk on white space networking (also known as cognitive radio, but he seems to eschew that term). Dilip Krishnaswamy (Qualcomm) talked about WWAN architectures, which (from the architectural standpoint) are different from voice or other kinds of networks, and in particular where the internet cloud sits with respect to the other system elements was interesting to me. Amitava Ghosh (Motorola) broke down LTE and WiMAX for us in gory detail.

ISIT 2009 : talks part four

The Gelfand-Pinsker Channel: Strong Converse and Upper Bound for the Reliability Function
Himanshu Tyagi, Prakash Narayan
Strong Converse for Gel’fand-Pinsker Channel
Pierre Moulin

Both of these papers proved the strong converse for the Gel’fand-Pinsker channel, e.g. the discrete memoryless channel with iid state sequence P_S, where the realized state sequence is known ahead of time at the encoder. The first paper proved a technical lemma about the image size of “good codeword sets” which are jointly typical conditioned on a large subset of the typical set of S^n sequences. That is, given a code and a set of almost \exp(n H(P_S)) typical sequences in S^n for which the average probability of error is small, then they get a bound on the rate of the code. They then derive bounds on error exponents for the channel. The second paper has a significantly more involved argument, but one which can be extended to multiaccess channels with states known to the encoder.

Combinatorial Data Reduction Algorithm and Its Applications to Biometric Verification
Vladimir B. Balakirsky, Anahit R. Ghazaryan, A. J. Han Vinck

The goal of this paper was to compute short fingerprints f(\mathbf{x}) from long binary strings \mathbf{x} so that a verifier can look at a new long vector \mathbf{y} and tell whether or not \mathbf{y} is close to \mathbf{x} based on f(\mathbf{x}). This is a little different from hashing, where we could first compute f(\mathbf{y}). They develop a scheme which stores the index of a reference vector \mathbf{c} that is “close” to \mathbf{x} and the distance d(\mathbf{x},\mathbf{c}). This can be done with low complexity. They calculated false accept and reject rates for this scheme. Since the goal is not reconstruction or approximation, but rather a kind of classification, they can derive a reference vector set which has very low rate.

Two-Level Fingerprinting Codes
N. Prasanth Anthapadmanabhan, Alexander Barg

This looks at a variant of the fingerprinting problem, in which you a content creator makes several fingerprinted versions of an object (e.g. a piece of software) and then a group of pirates can take their versions and try to create a new object with a valid fingerprint. The marking assumption mean that the pirates can only alter the positions in their copies which are different. The goal is to build a code such that a verifier looking at an object produced by t pirates can identify at least one of the pirates. In the two-level problem, the objects are coarsely classified into groups (e.g. by geographic region) and the verifier wants to be able to identify one of the groups of one of the pirates when there are more than t pirates. They provide some conditions for traceability and constructions, This framework can also be extended to multiple levels.

ISIT 2009 : talks part three

This is the penultimate post on papers I saw at ISIT 2009 that I thought were interesting and on which I took some notes. I think I’m getting lazier and lazier with the note taking — I went to lots more talks than these, but I’m only writing about a random subset.

Existence and Construction of Capacity-Achieving Network Codes for Distributed Storage
Yunnan Wu

This paper looked at the distributed storage problem using the framework of network coding that Alex has worked on. The basic idea is that you have a bunch of drives redundantly storing some information (say n drives and you can query any k to reconstruct the data). You want to make it so that if any drive fails, it can be replaced by a new drive so that the new drive doesn’t have to download too much data to maintain the “k out of n” property. The problem can be translated into a network code on an infinite graph. This paper talked about how to achieve the optimal tradeoff between the bandwidth needed to repair the code and the storage efficiency/redundancy. The key thing is that even though the network graph is infinite, the network code can be constructed over a field whose size only depends on the number of active disks. Schwartz-Zippel reared its head again in the proof…

Achievability Results for Statistical Learning under Communication Constraints
Maxim Raginsky

This talk was on how to learn a classifier when the data-label pairs (X_i, Y_i) must be compressed. Max talked about two different scenarios, one in which you can use R bits per pair, and the other in which only the labels Y_i must be sent. He defined a rate-distortion-like function where the generalization error the classifier played the role of the distortion. In the first case it’s easier for the encoder to learn the classifier itself and send the classifier over — this takes 0 bits per sample, asympotically, and incurs no extra generalization error. In the second case he uses a kind of Wyner-Ziv-like scheme which is a bit more involved. Compression is not a thing many machine learning folks think about, so I think there’s a lot of interesting stuff in this direction.

On Feedback in Network Source Coding
Mayank Bakshi, Michelle Effros

This paper was about how feedback can help in distributed source coding. In some lossless and lossy problems, feedback can strictly increase the rate region. Some of these problems took the form of “helper” scenarios where the decoder wants one of the sources but gets coded side information from another source. They show that this scenario holds more generally, and so feedback helps in general in network source coding.

Feedback Communication over Individual Channels
Power Adaptive Feedback Communication over an Additive Individual Noise Sequence Channel

Yuval Lomnitz, Meir Feder

These talks were near and dear to my heart since they dealt with coding over channels which are basically unknown. The basic idea is that you know only X^n and Y^n, the inputs and outputs of the channel. The trick is then to define a corresponding “achievable rate.” They define an empirical mutual information and show that it is asymptotically achievable under some circumstances. The scheme is based on random coding plus maximum mutual information (MMI) decoding. When feedback is present they can do some rateless coding. There’s a full version on ArXiV.

Upper Bounds to Error Probability with Feedback
Barış Nakiboğlu, Lizhong Zheng

Baris gave a nice talk on his approach to modifying the Gallager exponent bounds to the case when feedback is available. The encoder constantly adapts to force the decoder into a decision. The upper bound is based on a one-step/channel-use reduction to show an exponentially decaying error probability. One nice thing he mentioned is that at rates below capacity we can get a better error exponent by not using the capacity-achieving input distribution. Even though I read the paper and Barış explained it to me very patiently, I still don’t quite get what is going on here. That’s probably because I never really worked on error exponents…

ISIT : first set of talks

Outer Bounds for User Cooperation
Ravi Tandon, Sennur Ulukus

This paper looked converse bounds for the MAC with generalized feedback. By applying the dependence balance bound, one can get an outer bound for the capacity region in terms of auxiliary random variables whose cardinality is difficult to bound. For a specific form of the channel, the “user cooperation channel.” where the feedback signals to the users are made from noisy versions of the other users’s signal, they find a way to evaluate the bound by looking at all input densities satisfying the constraints and then arguing that it is sufficient to consider Gaussian densities. The resulting bound matches the full-cooperation and no-cooperation bound when taking the respective limits in the feedback noise, and is tighter than the cutset bound.

Optimal Quantization of Random Measurements in Compressed Sensing
John Sun, Vivek Goyal

The goal here was to design a quantizer for noisy measurements in compressed sensing, while keeping the decoder/reconstructor fixed. This was done in the same high rate scalar quantization setting that I’d seen in other talks, with point-density functions to represent the reconstruction points. They draw a connection to the earlier work on functional scalar quantization and find optimal quantizers for the Lasso. They used an recent called “homotopy continuation,” which looks at the solution produced by the Lasso as a function of the regularization parameter, which I should probably read up on a bit more…

A Sparsity Detection Framework for On-Off Random Access Channels
Sundeep Rangan, Alyson Fletcher, Vivek Goyal

This looked at a MAC with n users and the decoder wants to detect which users transmitted. The users are each “on” with some fixed probability, and they show this can be put into a sparsity detection framework. There are two problems to overcome — the near-far effect of large dynamic range of the received signals, and a “multiaccess interference” (MAI) phenomenon that plagues most detection algorithms, including the Lasso and orthogonal matching pursuit (OMP). The question is whether a practical algorithm can overcome these effects — they show that a modification of OMP can do so as long as the decoder knows some additional information about the received signals, namely the power profile which is the order of the users’ power, conditional that they are active.

Eigen-Beamforming with Delayed Feedback and Channel Prediction
Tr Ramya, Srikrishna Bhashyam

This paper looked at the effect of delay in the feedback link in a system that is trying to do beamforming. The main effect of delay is that there is a mismatch between CSIT and CSIR. There is also the effect of channel estimation error. One solution is to do some channel prediction at the encoder, and they show that delayed feedback can affect the diversity order, while prediction can help improve the performance. Unfortunately, at high Doppler shift and and high SNR, the prediction filter becomes too complex. I hadn’t really seen much work on the effect of longer delay in feedback, so this was an interesting set of ideas for me.