Dorfman, Warner, and the (false) stories we tell

I’ve been thinking about reviving the blog and as maybe a way of easing back in I’ve come up with some short post ideas. As usual, these are a bit half-baked, so YMMV.

A common way of generating a “hook” in a technical talk is to say “actually, this is really an old idea.” There are two examples of this that come to mind for me, group testing and randomized response. In both of these topics, there is a “classic paper” with an “interesting historical anecdote” that generates a kind of factoid in the audience’s mind. Unfortunately, the factoid that gets stored is often incorrect.

Group testing refers to the process of finding a (small) number of “defective elements” a larger set of elements by testing groups of elements. The assumption is that the test you have is sensitive enough to flag a group as containing a defective element. This was first proposed in Robert Dorfman’s 1943 paper The Detection of Defective Members of Large Populations in The Annals of Mathematical Statistics. He introduced group testing with the application of screening for syphilis in the United States Public Health Service and the Selective Service System during WWII. A syphilis test (the Wasserman test) which is sufficiently sensitive could be used on pooled blood samples: if the test was negative the whole group is clear and if the group is positive you could test individuals in the group or further subdivide.

As noted in this paper by Gilbert and Strauss, the “Selective Service System did not put group testing for syphilis into practice because the Wassermann tests were not sensitive enough…” when pooled. That paper doesn’t have a citation but you can find it in the book by Du and Hwang

Unfortunately, this very promising idea of grouping blood samples for syphilis screening was not actually put to use. The main reason, communicated to us by C. Eisenhart, was that the test was no longer accurate when as few as eight or nine samples were pooled. Nevertheless, test accuracy could have been improved over years, or possibly not a serious problem in screening for another disease. Therefore we quoted the above from Dorfman in length not only because of its historical significance, but also because at this age of a potential AIDS epidemic, Dorfman’s clear account of applying group testing to screen syphilitic individuals may have new impact to the medical world and the health service sector.

It seems that group testing was not used for the syphilis screenings. Most people are careful to say it was proposed and not used but without “closing the loop” people learning about it for the first time could be misled. Group testing has been used for many other diseases, most recently in some approaches for COVID screening.

The second example is randomized response, which is a technique for providing plausible deniability to survey respondents. A surveyer asks a sensitive binary question to an interviewee. The interviewee’s true answer is X∈{0,1}, samples a Bernoulli(p) random variable Z∈{0,1} (“flips a biased coin”) and responds with Y=X⊕Z where ⊕ is addition modulo 2. Randomized response was proposed by Stanley L. Warner in his 1965 JASA paper Randomized response: A survey technique for eliminating evasive answer bias. Talks on differential privacy (especially local differential privacy) often trot out Warner’s paper as an example of how differential privacy has appeared “classically.” I have done this myself.

Unfortunately, as a 2015 JASA paper of Blair, Imai, and Zhou notes:

Despite the wide applicability of the randomized response technique and the methodological advances, we find surprisingly few applications. Indeed, our extensive search yields only a handful of published studies that use the randomized response method to answer substantive questions… 

The earliest study they could find was by Madigan et al. from 1976, who looked at a Misamis Oriental province in northern Mindanao (Philippines) and the prevalance of hiding deaths from official counts. So it seems randomized response was not implemented for around a decade after being proposed.

These examples show how easy it is to create misunderstandings by using anecdnotes about prior work in talks. I am certainly guilty of both misunderstanding the actual facts and perhaps misrepresenting what actually happened after these cool ideas were first proposed. We all know that the gap between theory and practice can be large, but somehow these fun stories make us a bit less careful.

Linkage

This NSF report from the Office of the Inspector General has some really horrendous examples of data fabrication, plagiarism, and other misconduct by PIs and graduate fellowship (GRFP) recipients. It’s true that bad behavior taints the whole program: how good is the GRFP selection process if students like this get awards?

This article on Bhagat Singh Thind is fascinating. We need a modern Ghadar Party here. But this is so bizarre: “[o]ut of necessity and ingenuity, Thind, along with several dozen South Asians during the interwar decades reinvented themselves as itinerant spiritual teachers and metaphysical lecturers who would travel from city to city, giving lectures and holding private classes.”

A photo gallery by Lotfi Zadeh: some of these are really beautiful portraits. Also the variety! I remember not really understanding portraiture when I was younger but I think I “get it” a bit more now. Or at least why it’s interesting. There’s even a photo of Claude Shannon… from the email:

Prof. Lotfi Zadeh, who passed away in 2017, was an avid photographer who grew up in a multicultural environment, surrounded himself with a cosmopolitan crowd, and always kept his mind open to new ideas. In the 1960s and 70s, he enjoyed capturing the people around him in a series of black and white portraits. His burgeoning career gave him access to a number of artists, academics, and dignitaries who, along with his colleagues, friends, and family, proved a great source of inspiration for him.

THE SQUIRCLE IS SO FASCINATING!

I helped organize a workshop at IPAM on privacy and genomics. Videos (raw) are up now.

Linkage

Badass women cartographers!

Looking back on Shoes.

At a DARPA PI meeting recently, I met some folks from Cybernetica who told me about the hot new startup CountryOS! (EDIT: it’s not their startup).

A recent 99% Invisible episode describes the history of the SIGSALY, a secure communication system developed during WWII that used white noise one-time pads printed on vinyl to analog-encrypt communications lines.

Thanks to The Allusionist, I learned about EuroSpeak and discovered this guide on Misused English words and expressions in EU publications, which is hilarious.

Readings

Thinking, Fast and Slow (Daniel Kahneman). This was recommended by Vivek Goyal, and is Kahneman’s popular nonfiction book about the psychology of decision making in humans (as opposed to rational-decision making models like those in economics). The System 1/System 2 model was new to me, even though the various biases and heuristics that he describes were things I had heard about in different contexts. While quite interesting and a book that anyone who works on decision making should read (I’m looking at you, statisticians, machine learners, and systems-EE folks), it’s a bit too long, I think. I found it hard to power through at the end, which is where he gets into prospect theory, a topic which my colleague Narayan Mandayam is trying to apply in wireless systems.

Men Explain Things To Me (Rebecca Solnit). A slim volume collecting several of Solnit’s essays on feminism and its discontents, from the last few years. I was familiar with some of the essays (including the first one) but was surprised by her ultimately hopeful tone (many of the essays come with introductions describing their context and how she feels about them now). Highly recommended, but I don’t think it will help with any Arguments On The Internet.

The Idea of India (Sunil Khilnani). This book is a bit older now but provides a lot of crucial context about the early Indian state, the relationship between urbanism and social change, and the nature of electoral politics in India. Reading this gave me a more nuanced view of the complexity of contemporary Indian politics, or at least a more nuanced view of how we got here (beyond the usual history of communalism). The origins of the cronyism of Congress and the causes and effects of the Emergency were also a new perspective for me.

The Sympathizer (Viet Thanh Nguyen). This is about an undercover Vietnamese (well, half-Vietnamese, as people keep pointing out) undercover agent who leaves during the evacuation of Saigon and embeds himself in the refugee community, sending coded messages about counter-revolutionary plans. Our unnamed narrator has a an epic adventure, darkly comic and tragic, initially told as a confessional in some sort of prison interrogation. He was educated in the US before going back to Vietnam — this puts him between two worlds, and the novel is fundamentally about this tension. Throughout people are archetypes: The General, The Auteur, the crapulent major. Although long, the novel is rewarding: the last quarter really put me through the wringer, emotionally.

Station Eleven (Emily St. John Mandel). A novel about a post-apocalyptic future (split between pre-slightly post-and much post) in which much of the world has been decimated by a mysterious infection. The novel revolves around a series of connected characters: an actor who dies on stage in a production of King Lear, his ex-wife, who wrote a series of comics about a remote station, a child actor from the same production who survives to become part of a traveling theater company in the post-apocalyptic wasteland that was once Michigan, an audience member who was once a paparazzo following the actor. The whole novel has a haunting air to it, a bit of a dreamy sensibility that makes it easy to read (too) quickly. The connections between the characters were not surprising when they were revealed, but they didn’t need to be — the book doesn’t rely on that kind of gimmickry. Read it while traveling: you won’t look at airports the same way again.

Linkage

I’m in the process of moving to New Jersey for my new gig at Rutgers. Before I start teaching I have to go help run the the Mystery Hunt, so I am a little frazzled and unable to write “real” blog posts. Maybe later. In the meantime, here are some links.

The folks at Puzzazz have put out a bevy of links for the 200th anniversary of the crossword puzzle.

The UK has issued a pardon to Alan Turing, for, you know, more or less killing him. It’s a pretty weasely piece of writing though.

An important essay on women’s work: “…women are not devalued in the job market because women’s work is seen to have little value. Women’s work is devalued in the job market because women are seen to have little value.”. (h/t AW)

Of late we seem to be learning quite a bit about early hominins and hominids (I had no idea that hominini was a thing, nor that chimps are in the panini tribe, nor that “tribe” is between subfamily and genus). For example,
they have sequenced some old bones in Spain. Extracting sequenceable mitochondrial DNA is pretty tough — I am sure there are some interesting statistical questions in terms of detection and contamination. We’ve also learned that some neanderthals were pretty inbred.

Kenji searches for the perfect chocolate chip cookie recipe.

Linkage

I am traveling all over India at the moment so I’m not really able to write contentful posts. Here are even more links instead, sigh. Maybe later I’ll talk about log-Sobolev inequalities so I can be cool like Max.

Speaking of Max, he posted this hilarious bad lip reading version of Game of Thrones. Probably NSFW. I don’t even like the series but it’s pretty funny.

For those who are fans of Rejected, Don Hertzfeldt’s new film is available on Vimeo.

Those who were at Berkeley may remember seeing Ed Reed perform at the Cheeseboard. His album (which I helped fund via indiegogo, was named a Downbeat Editors’ Pick. It’s a great album.

In light of the Snowden leaks, some doubt has been cast on NIST’s crypto standards.

I’m super late to this, but I endorse Andrew’s endorsement of Sergio‘s interview with Robert Fano in the IT Newsletter. Here’s just the article, if you want that.

Readings

I’ve been on some flights lately and skived off of work to read a bit more.

The White Tiger (Aravind Adiga) — a farce told from the perspective of a murderer-turned entrepreneur in Bangalore writing letters to Wen Jiabao. I think there are definitely some interesting issues here especially with Adiga trying to write the voice of the subaltern. The point of the book seems to be to skewer the rich in India (and by implication the middle class which seeks to emulate the rich) but I’m not sure if the hits land where they are targeted. Definitely worth reading and discussing if you care about India. People who have never been there may find it less… familiar, and so their reading experience would be quite different.

Interworld (Neil Gaiman and Michael Reaves) — a Young Adult science fiction/fantasy novel. A bit of a thin premise, world-building-wise, but a breezy read. Can’t really recommend it but it was ok.

Rule 34 (Charles Stross) — a follow-up to Halting State. Set in future-Scotland and has all of the techno-econo-conspiracy together with some interesting takes on the effect of how ubiquitous internet and custom-3D printing and fabbing can affect life.

A Man of Misconceptions (John Glassie) — a fascinating biography of Athanasius Kircher, whose fascinatingly incorrect “scholarship” makes for some enjoyable reading. Glassie’s book is a really engaging read and brings a lot of the context of Kircher’s world to life. Highly recommended.

Readings

Endless Things [John Crowley] — Book four of the Aegypt Cycle, and the one most grounded in the present. The book moves more swiftly than the others, as if Crowley was racing to the end. Many of the concerns of the previous books, such as magic, history, and memory, are muted as the protagonist Pierce Moffett wends his way through his emotional an intellectual turmoil and into what in the end amounts to a kind of peace. Obviously only worth reading if you read the first three books.

Understanding Privacy [Daniel Solove] — A law professor’s take on what constitutes privacy. Solove wants to conceptualize privacy in terms of clusters of related ideas rather than take a single definition, and he tries to put a headier philosophical spin on it by invoking Wittgenstein. I found the book a bit overwritten but it does parse out the things we call privacy, especially in the longest chapter on the taxonomy of privacy. It’s not a very long book, but it has a number of good examples and also case law to show how muddled our legal definitions have become. He also makes a strong case for increased protections and shows how the law is blind to the effects of information aggregation, for example.

The Fall of the Stone City [Ismail Kadare] — An allegorical novel by a Man Booker prize winner chronicling the Nazi occupation and the communist takeover of Gjirokaster, an old Albanian city. It’s a dark absurdist comedy, partly in the vein of Kafka but with a bit of… Calvino almost. The tone of the book (probably a testament to the translator) has this almost academic detachment, gently mocking as it describes the ways in which the victors try to rewrite history.

Invisible Men [Becky Pettit] — A sobering look at how mass incarceration interacts with official statistics. Because most surveys are household-based, they do not count the increasingly larger incarcerated population, thereby introducing a systematic racialized bias in the statistics used for public policy. In particular, Pettit shows how this bias leads to underestimation of racial inequity because the (mainly young black male) prisoners are “erased” in the official records.

The Rise of Ransom City [Felix Gilman] — A sequel to The Half-Made World, and a wondrously engrossing read it is too, filled with the clash of ideas, the corruption of corporations, the “borrowing” and evolution of ideas, and the ravages of industrialization. Also has a healthy dose of Mark Twain for good measure.

The history of new foods in India

Konnichiwa, Varshney-san. Your post on the potato inspired me to read the papers you mentioned as well as a reference suggested by a friend here in Chicago:

Sucheta Mazumdar, “The Impact of New World Food Crops on the Diet and Economy of China and India, ca. 1600-1900.” Food in Global History. Ed. Raymond Grew. Westview Press, 1999. 58-78.

The Columbian Exchange refers to the interchange of foodstuffs, technologies, and disease after European contact with the Americas. In exchange for offering pestilence, brutal colonialism, and genocide, Europeans got a variety of staple crops with which they could support their burgeoning populations and which would later sustain the Industrial Revolution:

The exchange introduced a wide range of new calorically rich staple crops to the Old World—namely potatoes, sweet potatoes, maize, and cassava. The primary benefifit of the New World staples was that they could be grown in Old World climates that were unsuitable for the cultivation of Old World staples. (Nunn and Qian)

In addition, the discovery of quinine in the Andes enabled Europeans to invade and colonize tropical regions. In addition to the trans-Atlantic slave trade, this expansion introduced the widespread planting of cash crops such as rubber in Africa. Being an economics paper, there are some sobering quantitative measures to drive home the horrors of colonial exploitation:

The population of the Congo is estimated to have been about 25 million prior the rubber boom, in the 1880s. In 1911, after the peak of the boom, the population was 8.5 million, and in 1923 after the completion of the boom, it was 7.7 million. If one compares the population losses relative to the production of rubber, an astonishing conclusion is reached: an individual was “lost” from the Congo for every ten kilograms of rubber exported (Loadman, 2005, pp. 140–41).

The potato paper covers the effect of potatoes and tries to estimate (numerically) the impact potato cultivation had on population growth and urbanization in Europe. It is somewhat elusive to me what such a quantification “means,” but it’s of a piece with what Ian Hacking describes in The Taming of Chance : the torrent of printed numbers led to the publication of attendant “studies” slicing and dicing the numbers in statistical ways in order to “make sense” of them. The second Nunn and Qian paper covers capsicum, tomatoes, cacao, vanilla, coca, and tobacco, and contains some fun nutritional facts and trivia:

  • Capsicum is high in vitamins A, B and C, magnesium, and iron, and the extra saliva produced by capsacin helps digestion.
  • “Greece consumes the most tomatoes per capita… The tomato has been so thoroughly adopted and integrated into Western diets that today it provides more nutrients and vitamins than any other fruit or vegetable (Sokolov, 1993, p. 108).”
  • “[I]n Roald Amundsen’s trek to the South Pole, his men were allocated 4,560 calories per day, of which over 1,000 came from cacao (West, 1992, pp. 117–18).”

My interest came more from vegetables that almost define Indian cuisine : tomatoes, potatoes, and chilies. Mazumdar’s article focuses on the effect new crops had on China and India. Specific to this context,

There were two major periods of introduction of American plants into Asia. The first wave, in the sixteenth and seventeeth centuries, included sweet potatoes, maize, potatoes, jicamas, capsicums (chile peppers), squashes, and peanuts, cashews, custard apples, guavas, avocadoes, tomatoes, papaya, passion-fruit, pineapples, and sapodillas… In the second wave, American plants, such as cocoa and the sunflower, were brought to India even more recently in the twentieth century.

With them came new words of course — South Asian readers may know of a certain fruit as sapota (in the south) or chiku (in the north), both of which come from a Meso-American word (not sure of the language) chicosapote. The word achar for pickles came from the Carib axi meaning chile pepper.

The paper draws a distinction between how land ownership practices in India and China made a difference in how fast new foods were incorporated into the common diet. In China, a number of reforms allowed “tenancy rights to become inheritable” for peasants, meaning they had an incentive to say in place and try to extract more productivity from the land they had. The new crops, especially the sweet potato, became staples because they provided more calories per acre, and because they were drought- and pest-resistant, required less labor (especially over rice), and could grow in poor soil. Mazumdar writes:

[In the 1920s in south China] sweet potatoes regularly provided a supply of at least three to four months’ worth or food for practically everybody living in the countryside… they were eaten fresh, baked, boiled, or mashed with pickles.. ground into flour and made into noodles, bread, or a gruel… or stirred into a hash.

The sweet potato revolutionized the lives of peasants in China, giving them more calories and freeing time and labor to grow cash crops. Corn and peanuts were also widely cultivated, since corn could also grow in nutrient-poor soils and peanuts are good nitrogen-fixers and could be grown with sugarcane.

India was a different story — there was more arable land and “relatively low population growth between 1600 and 1850.” Due to military conflicts and tensions with zamindars (landlords), villages would often up and leave, transplanting themselves further from conflict or interference. This meant that unlike China, rural farmers were not as tied to specific locations during this period. Colonialism changed all that — people were pinned down and agriculture was commercialized, so in the 19th and 20th centuries American crops started flourishing. The Brits promoted the potato heavily, and increased urbanization brought it and the tomato into the mainstream. Although it’s hard to think of Indian food without tomatoes, potatoes, and chilies, these ingredients were only integrated around 150 years ago!