paper a day : importance sampling in rare-event simulations

Introduction to importance sampling in rare-event simulations
Mark Denny
European Journal of Physics, 22 (2001) : 403–411

This paper is about importance sampling (IS), which a method to improve the error behavior of Monte Carlo (MC) methods. In engineering systems, getting good simulation results for rare events (such as decoding error) on the order of 10-10 would require an obscene amount of computation if you just did things the naive way. For example, the quality of a numerical bound on the tail probability of a random variable gets worse and worse as you look farther and farther out. Importance sampling is a method of reweighting the distribution to either get a smaller error in the regime of interest and/or uniformize the estimation error. This paper gives some motivation, a simple IS algorithm, analysis, and some simulations. It’s pretty readable, and I went from knowing nothing about importance sampling to getting a decent idea of how to use it in practice, along with its potential problems and benefits.

a quote that gives some comfort

From Van H. Vu’s “Concentration of Non-Lipschitz Functions and Applications,” Random Structures and Algorithms 20 : 262–316, 2002 :

Finding this hypothesis was actually the hardest part of our work, requiring some insight and a numerous number of attempts. As the reader will see after reading Section 5, a proper induction hypothesis not only allows us to derive a powerful result, but also reduces the proof to a rather routine verification of a few conditions.

I have spent a long time reading through difficult proofs with numerous lemmata, wondering why it had to be so complicated. Some things have to be proved by brute force. For others, just phrasing the problem in the right way can make the proof seem trivial. Some might say “well, I could have done that,” but the more accurate response is “I wish I had thought to do it that way.”

adventures in grant administration

I just learned the Berkeley way of doing financial support for TA’s and RA’s here. The payroll paperwork for the appointment is sent down to the Graduate Division, which up-front pays the University for tuition and fees. They then turn around and charge the department or grant per month for the money over the course of the semester. That way, if you’re an RA and get fired after one month, the Grad Division will turn around and charge you for the last 2/3 of the tuition and fees.

For those who are TA-ing this semester, the bus pass fee and student life fee are not covered by the teaching appointment, so those have to be paid out of pocket. I don’t recall this being what happened in the past, so in my mind that’s a real step back for TAs. This fee is only waived for first-year grad students, presumably to lull them into the false sense that Berkeley is a place which fully supports its graduate student instructors.

I know that these policies happen for good reasons, namely to prevent waste and trim budgets. I’m sure that similar things happen at most other universities. But the way the bureaucracy evolved seems so bizarre to me at times.

Solla Solla Enna Perumai

Via Sepia Mutiny, one of the most awesome videos I’ve ever seen — a song from the 1981 Tamil film Ellaam Inbamayam and starring the rather famous actor Kamal Haasan (or if you’re IMDB, Kamal Hassan). He was awarded the Padmashri, but I would venture to guess that this particular video is not his finest acting moment. He does, however, have what some might call “the funk.”

UPDATE : he really likes gold lame boots and vests, it seems.

paper a day : Fountain Capacity

This is one of the papers I saw at ISIT. At the time I thought I understood it, but reading the actual paper made it much clearer.

Fountain Capacity
S. Shamai, E. Telatar, and S. Verdú
Proc. Int’l Symp. on Information Theory

The background for this paper is fountain codes. The basic idea is pretty simple. Suppose you have k packets of data that you want to send. Then one way of sending them is to pick a random subset of the packets, XOR them, and send that, along with some header information about which packets are in the XOR. This is just a linear combination of the packets, so a receiver getting a sufficient number of coded packets will be able to invert the set of linear equations to get the remaining data. The encoder acts like a fountain spewing out coded packets. Suppose the packets go over an erasure channel, for which packets either arrive intact or are erased. If there are many users with varying link qualities, the ones with better links will be able to decode early and disconnect. If a user connects in the middle, they don’t lose anything from missing the “earlier” coded packets. All that matters is that each user/decoder gets enough unerased packets to enable them to decode.

The purpose of this paper is to see if and how this decoder-centric view of communication can be put on a more traditional information-theoretic (read : Shannon Theory) foundation. The idea is that the encoder will take M messages and encode them into infinitely long strings. The encoding is randomized, and the randomization is known to the decoder. The encoded symbols go through a channel, but the decoder is pay-per-view : it can only see n outputs of the channel, but it can’t control which n outputs. An adversary, without knowing which codebook is being used, picks the schedule of n outputs observed by the decoder. The probability of decoding error is then given by the worst case over subsets of n outputs. The capacity is the supremum of rates for which the decoding error can be driven to 0.

The main results of the paper are

  • Randomized coding is necessary, and an infinite amount of randomness is required.
  • Fountain capacity is always upper bounded by Shannon capacity.
  • For stationary memoryless channels the fountain capacity is equal to the Shannon capacity.
  • For channels with memory, the fountain capacity may be less than Shannon capacity.

What’s a little unsatisfying about this paper is precisely the same thing that makes it interesting — the channel model they’re using is a slippery one, so it’s difficult to see how “reasonable” it is. For example, I think the infinite randomization thing is a real drawback, but maybe the channel assumptions can be relaxed to take that away. I’m interested in this paper because it is very related to arbitrarily varying channels (AVCs), which is what my thesis will likely revolve around. There are some comments at the end which relate it to AVCs, but I should think a little bit more about the real underlying problem and how to connect it back to AVCs.

Ginger-Lemon Opah in Parchment

I made this last night and it turned out decently, although the fish was cut too thick so I had to cook it a little longer. Opah may not be the ideal fish for this, so I’ll try it later with tilapia or other fillet.

1 Tbsp grated ginger
1 large clove garlic
3 Tbsp lemon juice
1 Tbsp honey
1 green chili

1 lb opah
2 sheets parchment paper

2 Tbsp chives

Mix all ingredients other than fish and chives, omitting chili seeds if desired. Marinate fish for approximately 30 minutes (longer will start to cook the fish, ceviche-style). Split fish into two halves and place in parchment envelopes, pouring remaining lemon juice on top. Bake at 400 degrees for 10-15 minutes.

Funkcialaj Ekvacioj

While poking around for some references today I took a detour through the Mathematical Reviews. I heard that Rota had written some zingers, but one of the first of his that I saw, on the Riemann-Liouville integral in the complex field contained the following comment:

Many of the results are essentially known, but this seems the best treatment in the literature, which would well justify a previous brief study of Esperanto for the interested reader.

A quick investigation shows that the paper is indeed in Esperanto!

The journal is Funkcialaj Ekvacioj, which presumably means “functional equations” in Esperanto. The entire editorial board is Japanese, and since its inception most of the contributing authors have been Japanese. The journal has been going since 1958, and they have full archives online (vol. 1-40 are free). At the beginning, they had articles in English, French, and Esperanto, with the abstract (if necessary) translated into Esperanto. That stopped as early as the 1970s (more’s the pity), and articles are just written in English, but the journal’s title remains in Esperanto.

This really made my day — maybe I should translate an article into Esperanto… except that I’d have to learn Esperanto first.

LaTeX figure repository

I’m pretty picky about figures for LaTeX documents now, and I hate having to make new figures from scratch since it takes forever. So a little while ago I got this free wiki called ThousandWords to hold figures that I had made and make them (along with relevant code) available to others. At the moment it’s pretty sparse — just a few things here and there that I put up — but it would be great to have more. In particular, right now it’s all information theory, signal processing, and networking, and there’s no reason (beyond disk space) that it can’t be more diverse.

So if any readers of this blog want to put up figures of their own for public use, let me know!

“The traveller must, of course, always be cautious of the overly broad generalisation”

George Saunders visits the UK.

But I am an American, and a paucity of data does not stop me from making sweeping, vague, conceptual statements and, if necessary, following these statements up with troops.

Furthermore, I feel confident that the discovery, by my countrymen, of the unique British delicacy called “fish and chips” would put an end to American obesity for ever.

Symphony Chorus again

I’m back in the Symphony Chorus! My reaudition was a bit harrowing, but I guess it worked out ok. I’m currently slated to sing:

  • A Mozart Journey (Michael Tilson Thomas [conductor], Jeremy Denk [piano])

    Steve Reich : Variations for Orchestra (Thursday, Saturday only)
    W.A. Mozart : A Mozart Journey (all-Mozart program)

  • John Adams’ A Flowering Tree [world premiere] (John Adams [conductor],
    Russell Thomas [tenor], Eric Owens [bass])

    John Adams : A Flowering Tree (SFS co-commission, US premiere)

    • Thursday, Mar. 1 at 8pm
    • Friday, Mar. 2 at 8pm
    • Saturday, Mar. 3 at 8pm

  • MTT and Richard Stoltzman (Michael Tilson Thomas [conductor],
    Richard Stolzman [clarinet])

    Stravinsky : Symphonies of Wind Instruments
    Stravinsky : Apollon musagète
    Takemitsu : Fantasma Cantos
    Stravinsky : Symphony of Psalms

    • Wednesday, April 11 at 8pm
    • Friday, April 13 at 8pm
    • Saturday, April 14 at 8pm