More advice on giving talks

I’d like to point out Alex Dimakis’s great post from last week on how to give a great ISIT talk (or any talk, really) and to put in a few bits more about a replicable process for writing talks that might help out students trying to write their first (or second, or third…) talk. As with any form of writing or communication, an unclear talk is usually the product of unclear thought on the part of the presenter. There’s no shortage of advice out there on how to give talks, but I figured I’d write down a process that I use to help streamline the process.

Before I get into the list, I have to issue a disclaimer : it’s not that I think I am super-great at giving talks. I am pretty good at critiquing talks to death, however. It must be the theater critic in me. What I have done is write down a process that has worked for me…

  1. Who is your audience? This is the place to start. If you are giving a talk at a conference, you may have a sense of what the interests of the audience are, but that is only part of it. Do you want your talk to be accessible to only faculty in your sub-area? (Hint : NO). Graduate students? If so, how much background should they have? Most of your talk should be accessible to a certain group — identify that group and target the bulk of the material to them. If you are giving a job talk it’s a different group from a conference or a specialized workshop.
  2. What do you want them to know? You have to be able to summarize what you want to say in one sentence. It’s not going to encapsulate everything, but it should be what you want your target audience to learn from your talk. Write this down and think about it. It’s not easy to summarize your work in a sound bite.
  3. Outlining the talk. Make an outline of the talk that contains one sentence per slide. Each slide should have a single main point that you can write down before adding any content to the slide. Read the sentences in order. It should be a story — does the story flow well? Does it make sense? Do you really need a table of contents slide for a 20 minute talk?
  4. Filling it in. For each sentence, think about how best to explain that point. Pictures are great, loads of equations are not. How can you, in a minute or two, make that point while talking, and what do you need visually to help emphasize that point? This is like storyboarding for a film.
  5. Balancing the content. Now that you have a plan for each slide, check to see that it’s balanced. Too many slides with text will fatigue people. Too many slides with just pictures may lead to confusion.
  6. Fill it in. Fill in the text and make the figures. This will take time, but since you have the sentence on each slide and the plan you should be able to think clearly through what you need to put down on each slide.
  7. Practice. Practice giving the talk once, out loud (not just in your head). You will find mistakes on the slides. Fix those mistakes. Practice again to find more mistakes. You might discover that the story doesn’t flow as well as you thought so you have to go back and retweak things. But always keep in mind the story.
  8. Presentation = Invitation. People say a talk is an invitation to read your paper, but that is not really true, I think. Chances are that more than half the audience will not read your paper anyway, but you still need to teach them something. Of the remaining half, most may skim your paper in the proceedings; you have to make that process easier for them. For the hardcore few who will really spend time reading your paper (because they are reviewing it), you want them to be excited by that prospect.
  9. Planning for contingencies. People get derailed about things like having backup slides with all the details of the proofs. Making backup slides, which are seen 1% of the time, takes away time from making the main presentation, which is seen 100% of the time. Focus on making the main presentation good.

ISIT : plenaries and thoughts

Just a few brief notes on the plenaries. Prakash Narayan gave a nice talk on his work on secrecy generation and related problems. It was nice because it tied together a number of different models in one talk so that if you were someone who had only looked at wiretap problems you could see a more unified approach to these problems. It was a little technical for my pre-breakfast brain though. Ueli Maurer gave an overview of his new approach to cryptography — I had seen a version of this before, and it was full of pictures to illustrate the reductions and interfaces he was trying to create. I think if I had more of a background in formal CS-style cryptography I might have understood it a bit better. It feels like trying to build a different style of bridge between theory (formal reasoning about security) and practice.

Abbas El Gamal gave a rather personal Shannon Lecture, taking us through a number of stages in his research life, together with some perspectives on his new book with Young-Han Kim on network information theory. He ended by calling for the IT community to really go and tackle new problems and develop new tools and models to do that. One of the things that came across more sharply for me in this ISIT, partly due to the Cover memorial, is that information theory really is a research community. Of course, there are groups and cliques and politics and problems, but each ISIT is a real coming together that reinforces that sense of community. That’s valuable.

ISIT 2012 : more talks

Since I am getting increasingly delayed by post-ISIT and pre-SPCOM business, I am going to have to keep the rest of blogging about ISIT a little short. This post will mention some talks, and I’ll keep the other stuff for a (final) post.

Efficient Tracking of Large Classes of Experts
András György, Tamas Linder, Gabor Lugosi
This paper was on expanding the reference class against one is competing in a “prediction with experts” problem. Instead of doing well against the best expert chosen in hindsight, you compete against the best meta-expert which can switch between the existing experts. This leads to a transition diagram that is kind of complicated, but they propose a unifying approach which traces along branches — the key is that every transition path can be well approximated, so the space of possibilities one is tracking will not blow up tremendously.

Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing
David Donoho, Adel Javanmard, Andrea Montanari
What a trendy title! Basically this problem looks at the compressed sensing problem when the sensing matrix is banded (this is what spatially coupled means), and solves it using Bayesian approximate message passing to do progressive decoding and elimination. The optimality is in the sense of matching with the Renyi dimension of the signal class for the data. I alas did not take notes for the next talk, which also seemed related: Hybrid Generalized Approximate Message Passing with Applications to Structured Sparsity (Sundeep Rangan, Alyson Fletcher, Vivek Goyal, Philip Schniter)

Quantized Stochastic Belief Propagation: Efficient Message-Passing for Continuous State Spaces
Nima Noorshams, Martin Wainwright
This problem was on BP when the state space is continuous — instead of passing the whole belief distribution, nodes pass along samples from the distribution and the receiving node does a kind of interpolation/estimate of the density. They show that this process converges on trees. This is related to a problem I’ve been thinking about for decentralized inference, but with a different approach.

Synchrony Amplification
Ueli Maurer, Björn Tackmann
This was a cool talk on a framework for thinking about synchrony in clocks — the model is pretty formal, and it’s something I never really think about but it seemed like a fun way to think about these problems. Basically they want to formalize how you can take a given clock (a sequence of ticks) and convert it into another clock. The goal is to not throw out too many ticks (which equals slowdown), while achieving synchrony.

Non-coherent Network Coding: An Arbitrarily Varying Channel Approach
Mahdi Jafari Siavoshani, Shenghao Yang, Raymond Yeung
Of course I have to go to a talk with AVC in the title. This looks at the same operator channel for network coding but then they assume the network matrix may be arbitrarily varying (with known rank). In this model they can define all the usual AVC concepts and they get similar sorts of results that you see for AVCs, like dichotomies between deterministic coding with average error and randomized coding.

Alternating Markov Chains for Distribution Estimation in the Presence of Errors
Farzad Farnoud, Narayana Prasad Santhanam, Olgica Milenkovic
This talk was on the repetition channel and getting the redundancy of alternating patterns. They show upper and lower bounds. The idea is you start out with a word like abccd and it goes through a repetition channel to get aaabbcccdddd for example, and then you look instead at abcd by merging repeated letters.

On Optimal Two Sample Homogeneity Tests for Finite Alphabets
Jayakrishnan Unnikrishnan
A two-sample test means you have two strings x^n and y^n and you want to know if they are from the same distribution. He looked at the weak convergence of the asymptotically optimal test to get bounds on the false alarm probability.

Hypothesis testing via a comparator
Yury Polyanskiy
This was on a model where two nodes get to observe X^n and Y^n drawn i.i.d. from either P_{XY} or Q_{XY} and they separately compress their observations into messages W_1 and W_1. The decision rule is to decide P_{XY} if W_1 = W_2. What’s the best exponent?

The Supermarket Game
Jiaming Xu, Bruce Hajek
This was on queuing. Customers come in and sample the loads of L queues and then pick one to join. Their strategies may differ, so there is a game between the customers and this can affect the distribution of queue sizes. As a flavor of the weird stuff that can happen, suppose all customers but one only sample one queue and join that queue. Then the remaining customer will experience less delay if they sample two and join the shorter one. However, if all but one sample two and join the shorter one, then it’s better for her to sample just one. At least, that’s how I understood it. I’m not really a queueing guy.

Collaborative paper filtering?

At ISIT 2012, there were posters up for a site called ShareRI.org: Share Research Ideas, an initiative of a student at UIUC named Quan Geng. It’s a platform for posting and discussing papers — sort of like creating a mini-forum around ArXiV posts. It seems to be just starting out now, but I figured I would post the link to see if others take it up. I imagine as things scale up it might run into similar problems as Wikipedia with trolling etc, but it’s an interesting idea which has come up before in discussions with the IT Society Online Committee, for example.

Tracking the origin of genies

Lalitha Sankar asked Gerhard Kramer about my earlier question about genies. Gerhard wrote:

I got the name from Jim Massey who had suggested it as part of a title for the thesis of another doctoral student I know.

I have heard this attributed to Gallager, but the word “genie” might even come up in the Wozencraft-Jacobs book from the mid-60s (not sure!). I suspect that it goes back even further.

A little further searching along those directions turned up some more hits. On page 366 of Viterbi and Omura’s 1978 text Principles of Digital Communication and Coding, while discussing the distribution of computation in convolutional codes they write “[W]e begin by considering a sequential decoder aided by a benevolent genie who oversees the decoder action on each incorrect subset.”

But indeed, as Gerhard indicates, there is a reference in Wozencraft and Jacobs (1965). From Rimoldi and Urbanke’s paper on rate splitting, they write “[C]onceptually, we follow the lead of Wozencraft and Jacobs [29, p. 419] and postulate a genie who always knows the codeword of user 2…” Following up on that reference, in reference to the decoding of convolutional codes, Wozencraft and Jacobs write “… assume initially that a magic genie directs the decoder to the correct starting node for determining each \hat{x}_h…”

In the bibliographic notes in Viterbi and Omura, they write

As was noted previously, the original sequential decoding algorithm was proposed and analyzed by Wozencraft [1957]. The Fano algorithm [1963], with various minor modifications, has been analyzed by Yudkin [1964], Wozencraft and Jacobs [1965], Gallager [1968], and Jelinek [1968a]. Two versions of stack algorithms and their performance analyses are due to Zigangirov [1966] and Jelinek [1969a]. The precise form of the Pareto distribution on computation emerged from the works of Savage [1966] for the upper bound, and of Jacobs and Berlekamp [1967] for the lower bound.

So it seems that if the argument is due to Wozencraft, the source of the genie argument in this context is probably due to the Wozencraft and Jacobs book, but the credit for the analogy to genies is probably lost in time to us.

Linkage

I’m being lazy about more ISIT blogging because my brain is full. So here are some links as a distraction.

Via John, George Boolos’s talk entitled Gödel’s Second Incompleteness Theorem Explained in Words of One Syllable.

D’Angelo is back!

This short video about a subway stair in New York is great, especially the music.

Crooked Timber is on a tear about workplace coercion and its proponents.

Luca’s thoughts on the Turing Centennial are touching.

Converse genie etymology

In his Shannon Lecture, Abbas mentioned that his work with Costa on deterministic interference channels was the first to use “genie-aided” converse arguments (essentially assuming the decoder has more information), but that they “lacked the imagination” to use that name. The question is, who did come up with the term “genie” in connection with converses?

Advice on giving talks

We are at ISIT and I realize I am going over the same points multiple times with my students, so I thought of summarizing everything here.

How to give a better ISIT Talk.

1. Take your talks very seriously.

Do practice runs. Many of them. Your only hope for academia is by giving great talks. Give a practice talk to your friends. In the middle of your talk pause and quiz them: ok, did you get why alpha and beta are not independent? (hint: they did not).
If they did not, it is your problem not their problem.

2. They do not remember what alpha is.

In most talks, your audience does not understand what the notation is, what the problem is, or why they should care. Think of yourself: how often do you sleep or suffer through talks without even knowing what the problem is?
Do not treat your audience like that.

It is a typical scene when the presenter is focusing on a minor technical issue for ten minutes when 90% of the audience does not even know what exactly the problem is, or care.

One important exception is when your audience works on the same problem. Typically only a small part of your talk should be focused on these experts (see also 13).

3. Do a multi-resolution talk.

A useful guideline is: for an 18 minute talk, 7-9 minutes should go on explaining the formulation of your problem and why should anybody care. 5-6 minutes on explaining *what* the solution is and 4 minutes or so, on the actual painful technical stuff. The first part should be aimed at a first year grad student level. The second at a senior grad student in the general ISIT area and the last part to the expert working on related problems. If fewer than 90% of your audience are checking email in the last part of your talk, consider that a success.

4. Try to make things simple, not difficult.

It is a common mistake for starting grad students to think that their work is too simple. For that reason they will not mention known things (like explaining that ML decoding for the erasure channel consists of solving linear equations, because they fear this is too simple and well known).
Always mention the basic foundations while you try to explain something non-trivial. Your goal is not to sound smart but rather to have your audience walk out knowing something more.

Even when your audience hears things they already know, they get a warm fuzzy feeling, they do not think you are dumb.

5. Add redundancy, repeat a lot in words.

Do not say ‘We try to minimize d(k)’.
Say `we try to minimize the degree d which as I mentioned, is a function of the number of symbols k’. Repeat things all the time: Summarize what you will talk about, and in conclusions say the main points again.

6. Go back to basic concepts in words, repeat definitions.

Try to mention the basic mathematical components not the jargon you have introduced. Do not not say ‘Therefore, the code is MSR-optimal‘ but ‘Therefore, the code minimizes the repair communication (what we call MSR optimal)‘. Try to reduce your statements back to fundamental things like probabilities, graphs, rank of matrices, etc whenever possible. Do not just define some alpha jargon in the first slide and talk about that damn alpha throughout your talk.

7. Never go over time.

I have often seen even experienced speakers getting a warning that they have 3 minutes and still trying to go through their ten last slides. When you are running out of time, the goal is not to talk faster.

Say something like ‘Unfortunately or fortunately for you, I do not have time to go into the proof so I will have to skip it. The main ingredient involves analyzing random matchings which is done through Hall’s theorem and union bounds. Please talk to me offline if you are interested…
Then, go through your conclusions slowly, repeating your main points.
This is another example of multi-resolution: you explain the techniques at a high level first. Even if you had time, you would still first have to give a one sentence high level description and then get into the the details.

8. Draw attention to important slides.

People are probably checking the Euro final when you are at slide 4, explaining what your problem is all about. Wake them up and give a notification that this is the one slide they do not want to miss. Do this right before the critical points.

9. Every slide should have one simple message.

After you make your slides ask yourself: what is the goal of this slide, I just want to explain this part. Iteratively try to simplify your slides into smaller and smaller messages. It is easier for your audience to grasp one packet of information at a time. Do not have derivations on slides (especially for an 18 minute talk), unless there is one very simple trick you really want to show. Showing math does not make you look smarter.

10. Be minimalist.

Every word on your slides, every symbol or equation you put up there dilutes the attention of your audience. Look at each bullet/slide and ask, do I really need this part or can I remove it?

11. Be excited.

Vary the tone of your voice, it may wake up somebody. You need to entertain and perform. Think: if you are not excited with your results why should anybody else be?

12. Cite people.

When somebody has related prior work, cite them on your slide. That has the benefit of waking them up when they see or hear their name.

As Rota says: `Everyone in the audience has come to listen to your lecture with the secret hope of hearing their work mentioned.

13. Connect to what your audience cares about.

This is non-trivial and requires experience. If you are giving a talk in a fountain codes session, you do not have to spend ten minutes defining things your audience knows already. Still define it quickly to make sure everybody is on the same page on notation. Knowing how to be at the right resolution for your audience becomes easier in time.

14. Prepare your logistics.

Know the room (go there before), know who your session chair is, have your macbook projector dongle, pre-load your slides on a USB. Bring your charger, disconnect from the internet (fun Skype messages pop-up during talks). If you are using a different machine, test your Powerpoint slides (hint: they look completely different).

15. Talk to people afterwards.

Talk to people about their work and your work. Remember that this is a professional networking event. Do not hang out with your friends, you have plenty of time for that after you go back home. Networking with other students and faculty is very important, in my case I learn more by talking to people offline than in talks.

16. Engineering theory is essentially story-telling.

Our papers and talks are essentially story-telling: Here is a model for a wireless channel, here is a proof about this model. A good story has an intellectual message that will hopefully help people think about a real engineering problem in a cleaner way.
The other aspect of our job is creating algorithms that are hopefully useful in real systems. Think: what is your story and how will you present it in your talk.

17. Read the brilliant Ten Lessons I Wish I Had Been Taught by Gian-Carlo Rota.