Someone hands you a coin which has a probability p of coming up heads. You can flip the coin as many times as you like (or more precisely, you can flip the coin an infinite number of times). Let S = \{r_i : i = 1, 2, \ldots\} be the set of rational numbers in [0,1]. After each flip, you have to guess one of the following hypotheses: that p = r_i for a particular i, or p is irrational. Furthermore, you can only make a finite number of errors for any p \in [0,1] - N_0, where N_0 is a set of irrationals of Lebesgue measure 0. Can you do it? If so, how?

This is the topic addressed by a pair of papers that Avrim Blum mentioned in Yoav Freund‘s remembrances of Tom Cover:

COVER, THOMAS M (1973). On determining the irrationality of the mean of a random variable. Ann. Math. Statist. 1862-871.
COVER & HIRSCHLER (1975). A finite memory test of the irrationality of the parameter of a coin. Annals of Statistics, 939-946

I’ll talk about the first paper in this post.

The algorithm is not too complicated — you basically go in stages. For each time j = 1, 2, \ldots you have a function n(j). Think of n(j) as piecewise constant. There are two sequences: a threshold k_{n(j)}, and an interval width \delta_{n(j)}.

  1. Take the sample mean \hat{x}_{n(j)} and look at a interval of width 2 \delta_{n(j)} centered on it. Note that this makes the same decision for each j until n(j) changes.
  2. Given an enumeration of the set S, find the smallest i such that r_i \in [\hat{x} - \delta_{n(j)}, \hat{x} + \delta_{n(j)}].
  3. I there is an i < k_{n(j)} such that r_i \in [\hat{x} - \delta_{n(j)}, \hat{x} + \delta_{n(j)}] then declare p = r_i, otherwise declare p \notin S
  4. .

The last thing to do is pick all of these scalings. This is done in the paper (I won’t put it here), but the key thing to use is the law of the iterated logarithm (LIL), which I never really had a proper appreciation for prior to this. For \epsilon > 0,

| \hat{x}_n - p | \le (1 + \epsilon) (2 p (1 - p) \sqrt{ \frac{\log \log n}{n} })

for all but finitely many values of n. This gets used to set the interval width \delta_{n(j)}.

The cool thing to me about this paper is that it’s an example of “letting the hypothesis class grow with the data.” We’re trying to guess if the coin parameter p is rational and if so, which rational. But we can only apprehend a set of hypotheses commensurate with the data we have, so the threshold k_{n(j)} limits the “complexity” of the hypotheses we are willing to consider at time j. The LIL sets the threshold for us so that we don’t make too many errors.

There are lots of little extensions and discussions about the rationality of physical constants, testing for rationality by revealing digits one by one, and other fun ideas. It’s worth a skim for some of the readers of this blog, I’m sure. A miscellaneous last point : Blackwell suggested a Bayesian method for doing this (mentioned in the paper) using martingale arguments.

The other day I found myself wondering “so what does the word martingale come from?” A short time on Google later, I came across this paper from Journal Electronique d’Histoire des Probabilités et de la Statistique, which had a special issue on The Splendors and Miseries of Martingales (Splendeurs et misères des martingales):

The Origins of the Word “Martingale”
Roger Mansuy
(earlier version : “Histoire de martingales” in Mathématiques & Sciences Humaines/Mathematical Social Sciences, 43th year, no. 169, 2005(1), pp. 105–113.)

It’s 10 pages and worth a read just for fun. Some of the fun facts:

  • Doob is the one who really made the name popular (in addition to proving many fundamental results). He got the name from a thesis by Ville.
  • A martingale is the name for a Y-shaped strap used in a harness — it runs along the horse’s chest and then splits up the middle to join the saddle.
  • A martingale is a name for a betting strategy (usually we think of doubling bets) but it’s not clear which one from the historical record.
  • “To play the martingale is to always bet all that was lost” (dictionary of the Acad ́emie Fran ̧caise, 4th ed.) — there are earlier dictionary definitions too, to 1750.
  • “A very slim trail seems to indicate a derivation of the word from the Provençal expression jouga a la martegalo, which means ‘to play in an absurd and incomprehensible way’.” Apparently Provençal is also the origin of Baccarat.
  • So what is martegalo? It might refer to a place called Martigues, whose residents are supposedly a bit naïve.
  • “Martingale pants” are from Martigues, and have, according to Rabelais, “a drawbridge on the ass that makes excretion easier.”
  • There’s a woman in the 17th century who called herself La Martingale and who made a number of prophetic predictions.
  • There were sailors called martégaux who gave their name to a rope called a martegalo used on sailboats. Perhaps this is where the horse connection comes in?
  • Apparently “martingale” is also vernacular for “prostitute,” but the etymology for that usage is not well-documented.

All in all, perhaps this essay ends up raising more questions than it answers, but I certainly had no idea that there was this much to be unearthed behind a simple word.

A few weeks ago I attended Scott Kominers‘s class on Market Design. They were talking about mechanism design and differential privacy so I felt like it would be fun to attend that session. In the class Scott mentioned some interesting work by Nicholas Lambert and Yoav Shoham on Truthful Surveys that appeared at WINE 2008. There’s also some recent work by Aaron Roth and Grant Schoenebeck up on ArXiV.

In Lambert and Shoham’s set up, the opinion distribution of a population is given by some CDF F(x) (with a density) on the unit interval [0,1]. We can think of x as a level of approval (say of a politician) and F(x) as the proportion of the population which has approval less than x. A surveyor selects n agents \{x_i\} i.i.d. from F and asks them to report their opinion. They can report anything they like, however, so they will report \{r_i\}. In order to incentivize them, the surveyor will issue a payment \Pi_i( r_1, \ldots, r_n ) to each agent i. How should we structure the payments to incentivize truthful reporting? In particular, can we make a mechanism in which being truthful is a Nash equilibrium (“accurate”) or the only Nash equilibrium (“strongly accurate”)?

Let A_i = |\{j : r_i  r_j \}|. They propose partitioning the agents into k groups with \mathcal{G}(i) denoting the group of agent $i$, and \tilde{F}_i(x) as an unbiased estimator of F(x) that uses the points \{r_j : \mathcal{G}_j \ne \mathcal{G}_i \}. The payments are:

\Pi_i(\{r_j\}) = \frac{1}{|\mathcal{G}_i| - 1} \left[ A_i - B_i \right] + 2 \tilde{F}_i(r_i) - \frac{2}{|\mathcal{G}_i| - 1} \sum_{j \in \mathcal{G}_i \setminus \{i\} } \tilde{F}_j(r_j)

This mechanism is accurate and also permutation-invariant with respect to the agents (“anonymous”) and the sum of the payments is 0 (“budget-balanced”).

This is an instance of a more general mechanism for truthfully inducing samples from a collection of distributions that are known — each agent has a distribution F_i and you want to get their sample of that distribution. Here what they do is replace the known distributions with empirical estimates, in a sense. Why is this only accurate and not strongly accurate? It is possible that the agents could collude and pick a different common distribution G and report values from that. Essentially, each group has an incentive to report from the same distribution and then globally the optimal thing is for all the groups to report from the same distribution, but that distribution need not be F if there is global collusion. How do we get around this issue? If there is a set of “trusted” agents \mathcal{T}, then the estimators in the payment model can be built using the trusted data and the remaining untrusted agents can be put in a single group whose optimal strategy is now to follow the trusted agents. That mechanism is strongly accurate. In a sense the trusted agents cause the population to “gel” under this payment strategy.

It seems that Roth and Schoenbeck are not aware of Lambert and Shoham’s work, or it is sufficiently unrelated (they certainly don’t cite it). They also look at truth in surveying from a mechanism design perspective. Their model is somewhat more involved (an has Bayesian bits), but may be of interest to readers who like auction design.

I’m at CISS right now on the magnolia-filled Princeton campus. The last time I came here was in 2008, when I was trying to graduate and was horribly ill, so this year was already a marked improvement. CISS bears some similarities to Allerton — there are several invited sessions in which the talks are a little longer than the submitted sessions. However, the session organizers get to schedule the entire morning or afternoon (3 hours) as they see fit, so hopping between sessions is not usually possible. I actually find this more relaxing — I know where I’m going to be for the afternoon, so I just settle down there instead of watching the clock so I don’t miss talk X in the other session.

Because there are these invited slots, I’ve begun to realize that I’ve seen some of the material before in other venues such as ITA. This is actually a good thing — in general, I’ve begun to realized that I have to see things 3 times for me to wrap my brain around them.

In the morning I went to Wojciech Szpankowski‘s session on the Science of Information, a sort of showcase for the new multi-university NSF Center. Peter Shor gave an overview of quantum information theory, ending with comments on the additivity conjecture. William Bialek discussed how improvements in array sensors for multi-neuron recording and other measurement technologies are allowing experimental verification of some theoretical/statistical approaches to neuroscience and communication in biological systems. In particular, he discussed an interesting example of how segmentation appears in the embryonic development of fruit flies and how they can track the propagation of chemical markers during development.

David Tse gave a slightly longer version of his ITA talk (with on DNA sequencing with more of the proof details. It’s a cute version of the genome assembly problem but I am not entirely sure what it tells us about the host of other questions biologists have about this data. I’m trying to wrestle with some short-read sequencing data to understand it (and learning some Bioconductor in the process), and the real data is pretty darn messy.

Madhu Sudan talked about his work with Brendan Juba (and now Oded Goldreich) on Semantic Communication — it’s mostly trying to come up with definitions of what it means to communicate meaning using computer science, and somehow feels like some of these early papers in Information and Control which tried to mathematize linguistics or other fields. This is the magical 3rd time I’ve seen this material, so maybe it’s starting to make sense to me.

Andrea Goldsmith gave a whirlwind tour of the work in backing away from asymptotic studies in information theory, and how insights we get from asymptotic analyses often don’t translate into the finite parameter regime. This is of a piece with her stand a few years ago on cross-layer design. High SNR assumptions in MIMO and relaying imply that certain tradeoffs (such diversity-multiplexing) or certain protocols (such as amplify-and forward) are fundamental but at moderate SNR the optimal strategies are different or unknown. Infinite blocklengths are the bread and butter of information theory but now there are more results on what we can do with finite blocklength. She ended with some comments on infinite processing power and trying to consider transmit and processing power jointly, which caused some debate in the audience.

Alas, I missed Tsachy Weissmann‘s talk, but at least I saw it at ITA? Perhaps I will get to see it two more times in the future!

In the afternoon I went to the large alphabets session which was organized by Aaron Wagner. Unfortunately, Aaron couldn’t make it so I ended up chairing the session. Venkat Chandrasekaran didn’t really talk about large alphabets, but instead about estimating high dimensional covariance matrices when you have symmetry assumptions on the matrix. These are represented by the invariance of the true covariance under actions of a subgroup of the symmetric group — taking these into account can greatly improve sample complexity bounds. Mesrob Ohanessian talked about his canonical estimation framework for large alphabet problems and summarized a lot of other work before (too briefly!) mentioning his own work on the consistency of estimators under some assumptions on the generating distribution.

Prasad Santhanam talked about the insurance problem that he worked on with Venkat Anantharam, and I finally understood it a bit better. Suppose you are observing i.i.d. samples X_t from a distribution P on \mathbb{R}^{+} that represent losses paid out by an insurer. The insurer gets to observe the losses for a while and then has to start setting premiums Y_t. The question is this : when can we guarantee that Y_t remains bounded and \mathbb{P}( Y_t > X_t \forall t ) > 1 - \eta? In this case we would say the distribution is insurable.

To round out the session, Wojciech Szpankowski gave a talk on analytic approaches to bounding minimax redundancy under different scaling assumptions on the alphabet and sample sizes. There was a fair bit of generatingfunctionology and Lambert W-functions. The end part of the talk was on scaling when you know part of the distribution exactly (perhaps through offline simulation or training) but then there is part which is unknown. The last talk was by Greg Valiant, who talked about his papers with Paul Valiant on estimating properties of distributions on n elements using only \Theta(n/\log n) samples. It was a variant of the talk he gave at Banff, but I think I understood the lower bound CLT results a bit better (using Stein’s Method).

I am not sure how much blogging I will do about the rest of the conference, but probably another post or two. Despite the drizzle, the spring is rather beautiful here — la joie du printemps.

My father sent me this paper (author’s version here) a little while ago on “Univariate Distribution Relationships.” It contains the following stunning chart:

Univariate Distributions in a Nutshell

Univariate Distributions in a Nutshell

All I can say is : wow. Pretty amazing, no?

As simplified version of the chart has been put up by John Cook.

I took a read through this fun paper that appeared on ArXiV a few months ago:

Avoidance Coupling
Omer Angel, Alexander E. Holroyd, James Martin, David B. Wilson, Peter Winkler
arXiv:1112.3304v1 [math.PR]

Typically when you think of coupling arguments, you want to create two (or more) copies of a Markov chain such that they meet up quickly. A coupling of Markov chains with transition matrix P is a sequence of pairs of \{(U_t, V_t)\} such that \{U_t\} and \{V_t\} are Markov chains with transition matrix P:

\mathbb{P}( U_{t+1} = j | U_t = i, V_t = i') = P(i,j)
\mathbb{P}( V_{t+1} = j' | U_t = i, V_t = i') = P(i', j')

Note that the two chains need not be independent! The idea is that we start \{U_t\} and \{V_t\} at different initial positions and when they “meet” we make them move together. Once they meet, the decisions from random mappings are the same for both chains. The coupling time is T_c = \min \{t : U_t = V_t \}. Coupling is used to show fast mixing of Markov chains via theorems like the following, which says that the difference between the distribution of the chain started at time i and the chain started at time i' at time t is upper bounded by the probability of coupling:

Theorem. Let \{ (U_t,V_t) \} be a coupling with U_0 = i and V_0 = i'. Then

\| P^t(i,\cdot) - P^t(i',\cdot) \|_{\mathrm{TV}} \le \mathbb{P}_{(U,V)}\left( T_c > t \right)

This paper takes an entirely different tack — sometimes you want to start off a bunch of copies of the chain such that they never meet up. This could happen if you are trying to cover more of the space. So you want to arrange a coupling such that the coupling time is huge. There’s a bit of a definitional issue regarding when you declare that two walkers collide (what if they swap places) but they just say “multiple walkers are assumed to take turns in a fixed cyclic order, and again, a collision is deemed to occur exactly if a walker moves to a vertex currently occupied by another. We call a coupling that forbids collisions an avoidance coupling.”

There are a number of results in the paper, but the simplest one to describe is for two walkers on a complete graph K_n (or the complete graph with loops K_n^{\ast}.

Theorem 4.1. For any composite n = ab, where a,b > 1, there exist Markovian avoidance couplings for two walkers on K_n and on K_n^{\ast}.

How do they do this? They partition n into b clusters \{S_i\} of size a. Let’s call the walkers Alice and Bob. Suppose Alice and Bob are in the same cluster and it’s Bob’s turn to move. Then he chooses uniformly among the vertices in another cluster. If they are in different clusters, he moves uniformly to a vertex in his own cluster (other than his own). Now when it’s Alice’s turn to move, she is always in a different cluster than Bob. She picks a vertex uniformly in Bob’s cluster (other than Bob’s) with probability \frac{a(b-1)}{ab - 1} and a vertex in her own cluster (other than her own) with probability \frac{a - 1}{ab - 1}.

So let’s look at Bob’s distribution. The chance that he moves to a particular vertex outside his current cluster is the chance that Alice moved into his cluster times the uniform probability of choosing something outside his cluster:
\frac{a(b-1)}{ab - 1} \times \frac{a (b-1)} = \frac{1}{ab - 1}
The chance that he moves to a vertex inside his own cluster is likewise
\frac{a - 1}{ab - 1} \times \frac{1}{a-1} = \frac{1}{ab - 1}
So the marginal transitions of Bob are the same as a uniform random walk.

For Alice’s distribution we look at the time reversal of the chain. In this case, Alice’s reverse distribution is Bob’s forward distribution and vice versa, so Alice also looks like a uniform random walk.

There are a number of additional results in the paper, such as:

Theorem 7.1. There exists a Markovian avoidance coupling of k walkers on K_n^{\ast} for any k \le n/(8 log_2 n), and on K_n for any k \le n/(56 log_2 n)$.

Theorem 8.1. No avoidance coupling is possible for n - 1 walkers on K_n^{\ast} , for n \ge 4.

In addition there are a ton of open questions at the end which are quite interesting. I didn’t mention it here, but there are also interesting questions of the entropy of the coupling — lower entropy implies easier simulation, in a sense.

As I mentioned, Behrouz Touri gave a really great presentation at ITA on some of his work with Angelia Nedić on products of stochastic matrices, some of which was in this paper on ArXiV. The setup of the paper is relatively straightforward — we have a sequence of independent random matrices \{W(k)\}, each of which is row-stochastic almost surely, and we want to know when the product \lim_{k \to \infty} W(k) W(k-1) \cdots W(t_0) converges almost surely. The main result is that if the chain is balanced and strongly aperiodic, then the limit is a random stochastic matrix such that the rows in the same connected component of the infinite flow graph are equal.

Recall that for a chain W a lazy version of the chain is \alpha W + (1 - \alpha) I. Laziness helps avoid periodicity by letting the chain be “stuck” with probability (1 - \alpha). Strongly aperiodic means that there is a \gamma \in (0,1] such that \mathbb{E}[ W_{ii}(k) W_{ij}(k) ] \ge \gamma \mathbb{E}[ W_{ij}(k) ]. Basically this is a sort of “expected laziness condition” which says that there is enough self-transition probability to avoid some sort of weak periodicity in the chain.

Consider a cut of the chain into a set of states S and a set \bar{S}. A chain is balanced if there is an \alpha > 0 such that for all cuts (S,\bar{S}), we have \mathbb{E}[ W_{S \bar{S}}(k) ] \ge \alpha \mathbb{E}[ W_{\bar{S} S}(k) ]. So this is saying that at each time, the flow out of S is commensurate with the flow into S.

The infinite flow graph has the same vertex set as the original chain, but with edge (i,j) existing only if \sum_{k} W_{ij}(k) + W_{ji}(k) = \infty. That is, the edge has to exist infinitely often in the graph.

So to add it all up, the well-behaved independent products of stochastic matrices that converge are those which don’t behave badly over time — for each time k, they don’t shift around the mass too much and they are not too periodic. Basically, they don’t become volatile and cyclical, which sounds perfectly reasonable. So how does he prove it?

The first part is to connect the problem with a dynamic system whose state x(k+1) = W(k+1) x(k) and look to see if the state converges in the limit. The next part uses a connection to absolute probability processes, which were used by Kolmogorov in 1936. A random vector process \pi(k) is an absolute probability process for \{W(k)\} if it is adapted to the same filtration as the chain, it’s stochastic almost surely, and

\mathbb{E}[ \pi(k+1) W(k+1) | \mathcal{F}_k ] = \pi(k)

Kolmogorov showed that for a deterministic sequence of stochastic matrices there is always an deterministic absolute probability process, so for independent chains we can always find a random absolute probability process. Using this, Touri and Nedić define a class of comparison functions which are super-martingales for each $\latex x(0)$ and have a limit. By choosing a particular comparison function they can get a version of the main result. It’s a nicely written paper and worth a skim if you’re interested in these things (as I am).

There are some more talks to blog about, probably, but I am getting lazy, and one of them I wanted to mention was Max’s, but he already blogged a lot of it. I still don’t get what the “Herbst argument” is, though.

Vinod Prabhakaran gave a talk about indirect decoding in the 3-receiver broadcast channel. In indirect decoding, there is a “semi-private” message that is not explicitly decoded by the third receiver. However, Vinod argued that this receiver can decoded it anyway, so the indirectness is not needed, somehow. At least, that’s how I understood the talk.

Lalitha Sankar talked about two different privacy problems that could arise in “smart grid” or power monitoring situations. The first is a model of system operators (ISOs) and how to view the sharing of load information — there was a model of K different “sources” or states being observed through a channel which looked like a AWGN faded interference channel, where the fading represents the relative influence of the source (or load on the network) on the receiver (or ISO). She didn’t quite have time to go into the second model, which was more at the level of individual homes, where short-time-scale monitoring of loading can reveal pretty much all the details of what’s going on in a house. The talk was a summary of some recent papers available on her website.

Negar Kiyavash talked about timing side channel attacks — an adversary can ping your router and from the delays in the round trip times can learn pretty much what websites you are surfing. Depending on the queueing policy, the adversary can learn more or less about you. Negar showed that first come first serve (FCFS) is terrible in this regard, and there is a bit of a tradeoff wherein policies with higher delay offer more privacy. This seemed reminiscent of the work Parv did on Chaum mixing…

Lav Varshney talked about security in RFID — the presence of an eavesdropper actually detunes the RFID circuit, so it may be possible for the encoder and decoder to detect if there is an eavesdropper. The main challenge is that nobody knows the transfer function, so it has to be estimated (using a periodogram energy detector). Lav proposed a protocol in which the transmitter sends a key and the receiver tries to detect if there is an eavesdropper; if not, then it sends the message.

Tsachy Weissman talked about how to estimate directed mutual information from data. He proposed a number of estimators of increasing complexity and showed that they were consistent. The basic idea was to leverage all of the results on universal probability estimation for finite alphabets. It’s unclear to me how to extend some of these results to the continuous setting, but this is an active area of research. I saw a talk recently by John Lafferty on forest density estimation, and this paper on estimating mutual information also seems relevant.

It’s been a while since I’ve posted, and I am going to try to post more regularly now, but as usual, things start out slowly, so here are some links. I’ve been working on massaging the schedule for the 2012 ITA Workshop (registration is open!) as well as some submissions for KDD (a first for me) and ISIT (since I skipped last year), so things are a bit hectic.

Chicago Restaurant Week listings are out, for the small number of you readers who are in Chicago. Some history on the Chicago activities of CORE in the 40s.

Via Andrew Gelman, a new statistics blog.

A paper on something called Avoidance Coupling, which I want to read sometime when I have time again.

Our team, Too Big To Fail, finished second in the 2012 MIT Mystery Hunt. There were some great puzzles in there. In particular, Picture An Acorn was awesome (though I barely looked at it), and Slash Fiction was a lot of fun (and nostalgia-inducing. Ah, Paris!). Erin has a much more exhaustive rundown.

I anticipate I will be doing a fair bit more reading in the future, due to the new job and personal circumstances. However, I probably won’t write more detailed notes on the books. This blog should be a rapidly mixing random walk, after all.

Embassytown (China Miéville) : a truly bizarre novel set on an alien world in on which humans have an Embassy but can only communicate with the local aliens in a language which defies easy description. Ambassadors come in pairs, as twins — to speak with the Ariekei they must both simultaneously speak (in “cut” and “turn”). The Ariekei’s language does not allow lying, and they have contests in which they try to speak falsehoods. However, events trigger a deadly change (I don’t want to give it away). Philosophically, the book revolves a lot around how language structures thought and perception, and it’s fascinating if you like to think about those things.

Chop Suey: A Cultural History of Chinese Food in the United States (Andrew Coe) : an short but engaging read about how Chinese food came to the US. The book starts really with Americans in China and their observations on Chinese elite banquets. A particular horror was that the meat came already chopped up — no huge roasts to carve. Chapter by chapter, Coe takes us through the railroad era through the 20s, the mass-marketing of Chinese food and the rise of La Choy, through Nixon going to China. The book is full of fun tidbits and made my flights to and from Seattle go by quickly.

The Thousand Autumns of Jacob de Zoet: A Novel (David Mitchell) : I really love David Mitchell’s writing, but this novel was not my favorite of his. It was definitely worth reading — I devoured it — but the subject matter is hard. Jacob de Zoet is a clerk in Dejima, a Dutch East Indies trading post in 19th century Japan. There are many layers to the story, and more than a hint of the grotesque and horrific, but Mitchell has an attention to detail and a mastery with perspective that really makes the place and story come alive.

Air (Geoff Ryman) : a story about technological change, issues of the digital divide, economic development, and ethnic politics, set in a village in fictional Karzistan (looks like Kazakhstan). Air is like having mandatory Internet in your brain, and is set to be deployed globally. During a test run in the village, Chung Mae, a “fashion expert,” ends up deep into Air and realizes that the technology is going to change their lives. She goes about trying (in a desperate, almost mad way) to tell her village and bring them into the future before it overwhelms them. There’s a lot to unpack here, especially in how technology is brought to rural communities in developing nations, how global capital and the “crafts” market impacts local peoples, and the dynamics of village social orders. It’s science fiction, but not really.

The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy (Sharon Bertsch McGrayne) : an engaging read about the history of Bayesian ideas in statistics. It reads a bit like an us vs. them, the underdog story of how Bayesian methods have overcome terrible odds (prior beliefs?) to win the day. I’m not sure I can give it as enthusiastic a review as Christian Robert, but I do recommend it as an engaging popular nonfiction read on this slice in the history of modern statistics. In particular, it should be entertaining to a general audience.

Dangerous Frames: How Ideas about Race and Gender Shape Public Opinion (Nicholas J.G. Winter) : the title says most of it, except it’s mostly about how ideas about race and gender shape white public opinion. The basic theoretical structure is that there are schemas that we carry that help us interpret issues, like a race schema or a gender schema. Then there are frames or narratives in which issues are put. If the schema is “active” and an issue is framed in a way that is concordant with the schema, then people’s opinions follow the schema, even if the issue is not “about” race or gender. This is because people reason analogically, so they apply the schema if it matches. To back up the theory, Winter has some experiments, both of the undergrads doing psych studies type as well as survey data, to show that by reframing certain issues people’s “natural” beliefs can be skewed by the schema that they apply. The schemas he discusses are those of white Americans, mostly, so the book feels like a bit of an uncomfortable read because he doesn’t really interrogate the somewhat baldly racist schemas. The statistics, as with all psychological studies, leaves something to be desired — I take the effects he notices at a qualitative level (as does he, sometimes).

Next Page »

Follow

Get every new post delivered to your Inbox.

Join 635 other followers