Teaching students to stay away from Physiognomic AI

I read Luke Stark and Jevan Hutson‘s Physiognomic AI paper last night and it’s sparked some thinking about additional reading I could add to my graduate course on statistical theory for engineering next semester (Detection and Estimation Theory).

“The inferential statistical methods on which machine learning is based, while useful in many contexts, fail when applied to extrapolating subjective human characteristics from physical features and even patterns of behavior, just as phrenology and physiognomy did.”

From the (mathematical) communication theory context in which I teach these methods, they are indeed useful. But I should probably teach more about the (non-mathematical) limitations of those methods. Ultimately, even if I tell myself that I am teaching theory, that theory has a domain of application which is both mathematically and normatively constrained. We get trained in the former but not in the latter. Teaching a methodology without a discussion of its limitations is a bit like teaching someone how to shoot a gun without any discussion of safety [*]

The paper describes the parallels between the development of physiognomy and some AI-based computer vision applications to illustrate how claims about the utility or social good arguments made now are nearly identical. They quote Lorenzo Niles Fowler, a phrenologist: “All teachers would be more successful if, by the aid of Phrenology, they trained their pupils with reference to their mental capacities.” Compare this to the push for using ML to generate individual learning plans.

The problem is not (necessarily) that giving students individualized instruction is bad, but that ML’s “internally consistent, but largely self-referential epistemological framework” cherry picks what it wants from the application domain to find a nail for the ML hammer. As they write: “[s]uch justifications also often point to extant scientific literature from other fields, often without delving its details and effacing controversies and disagreements within the original discipline.”

Getting back to pedagogy, I think it’s important to address this “everything looks like a nail” phenomenon. One start is to think carefully even about the cartoon examples we use as examples. But perhaps I should add a supplemental reading list to go along with each topic. We fancy ourselves as theorists, but I think it’s a dodge. Students are taking the class because they want to learn ML because they are excited about doing machine learning. When they go off into industry, they should be able to think critically about whether the tool is right for the job: not just “is logistic loss the right loss function” but “is this even the right question to be asking or trying to answer?”

[*] That is, very American?

Some thoughts on teaching signals and systems

I’m teaching Linear Systems and Signals[*] (ECE 345) this semester at Rutgers. The course overall has 260+ students, split between two sections: I am teaching one section. This is my second time teaching it: last year I co-taught with Vishal Patel (who has decamped to Hopkins), and this semester I am co-teaching with Sophocles Orfanidis. I inherited a bit of a weird course: this is a 3-unit junior-level class with an associated 1-unit lab (ECE 347). Previous editions of the course had no recitations, which boggled my mind, since the recitation was where I really learned the material when I took the course (6.003 at MIT, with Greg Wornell as my recitation instructor). How are you supposed to understand how to do all these transforms without seeing some examples?

So this year we have turned ECE 347 into a recitation and moved the coding/simulation part of the course into the homework assignments. Due to the vagaries of university bureaucracy, however, we still have to assign a separate grade for the recitation (née lab). Moreover, there are some students who took the class without the lab and now just need to take 347! It’s a real mess. Hopefully it’s just one year of transition but this is also the year ABET [**] is showing up so we’ll see how things go.

After surveying a wide variety of textbook options for the course, we decided to go with the brand-new and free book by Ulaby and Yagle, Signals and Systems: Theory and Applications [***]. I really have to commend them on doing a fantastic job and making the book free, which is significantly better than $247 for the same book I used literally 20 years ago when I took this course. Actually, we mainly used another book, whose title/author eludes me now, but it had a green slipcover and was more analog control-focused (perhaps since Munther Dahleh was teaching). One major difference I noticed between textbooks was the order of topics. Assuming you want to do convolution, Laplace (L), Z, Fourier Series (FS), and Fourier Transforms (FT), you can do a sort of back and forth between continuous time (CT) and discrete time (DT): CT convolution, DT convolution, CTFS, DTFS, CTFT, DTFT, Laplace, Z CT convolution, DT convolution, Laplace, Z, CTFS, DTFS, CTFT, DTFT or do all one and then the other CT convolution, Laplace, CTFS, CTFT, DT convolution, Z, DTFS, DTFT DT convolution, Z, DTFS, DTFT, CT convolution, Laplace, CTFS, CTFT I like the alternating version because it emphasizes the parallels between CT and DT, so if you cover sampling at the end you can kind of tie things together. This tends to give students a bit of whiplash, so we are going for: CT convolution, DT convolution, Laplace, Z, CTFS, CTFT, DTFS, DTFT It’s all a bit of an experiment, but the thing I find with all textbooks is that they are never as modular as one might like. That’s good for a book but maybe not as good for a collection of curricular units, which in the end is what a S & S [****] class is. CNX is one type of alternative, or maybe something like the interactive book that my colleague Roy Yates dreams of. I find myself questioning my own choices of ordering and how to present things in the midst of teaching — it’s tempting to experiment mid-stream but I have to tamp down the urges so that I don’t lose the class entirely. [*] You can tell by the word ordering that it was a control theorist who must have named the course. [**] Accreditation seems increasingly like a scam these days. [***] You can tell by the word ordering where the sympathies of the authors lie. [****] Hedging my bets here. Linkage Cheating: The List Of Things I Never Want To Hear Again. This is an almost definitive list of plagiarism/cheating excuses. I both love and loathe the idea of making students sign a pledge, but there’s that saying about a horse and water… (h/t Daniel Hsu) This note on data journalism comes with a longer report about how to integrate data journalism into curricula. It strikes me that many statistics and CS departments are missing the boat here on creating valuable pedagogical material for improving data analytics in journalism. (h/t Meredith Broussard) Speaking of which, ProPublica has launched version 2.0 of it’s Data Store! Of course, data isn’t everything: The Perils of Using Technology to Solve Other People’s Problems. DARPA just launched a podcast series, Voices from DARPA, where DARPA PMs talk about what they’re doing and what they’re interested in. The first one is on molecular synthesis. It’s more for a popular audience than a technical one, but also seems like a smart public-facing move by DARPA. My friend Steve Severinghaus won the The Metropolitan Society of Natural Historians Photo Contest! My friend (acquaintance?) Yvonne Lai co-authored this nice article on teaching high school math teachers and the importance of “mathematical knowledge for teaching.” Data: what is it good for? (Absolutely Something): the first few weeks So Waheed Bajwa and I have been teaching this Byrne Seminar on “data science.” At Allerton some people asked me how it was going and what we were covering in the class. These seminars are meant to be more discussion-based. This is a bit tough for us in particular: • engineering classes are generally NOT discussion-based, neither in the US nor in Pakistan • it’s been more than a decade since we were undergraduates, let alone 18 • the students in our class are fresh out of high school and also haven’t had discussion-based classes My one experience in leading discussion was covering for a theater class approximately 10 years ago, but that was junior-level elective as I recall, and the dynamics were quite a bit different. So getting a discussion going and getting all of the students to participate is, on top of being tough in general, particularly challenging for us. What has helped is that a number of the students in the class are pretty engaged with the ideas and material, and we do in the end get to collectively think about the technologies around us and the role that data plays a bit differently. What I wanted to talk about in this post was what we’ve covered in the first few weeks. If we offer this class again it would be good to revisit some of the decisions we’ve made along the way, as this is as much a learning process for us as it is for them. A Byrne Seminar meets for 10 times during the semester, so that it will end well before finals. We had some overflow from one topic to the next, but roughly speaking the class went in the following order: • Introduction: what is data? • Potentials and perils of data science • The importance of modeling • Statistical considerations • Machine learning and algorithms • Data and society: ethics and privacy • Data visualizaion • Project Presentations I’ll talk a bit more on the blog about this class, what we covered, what readings/videos we ended up choosing, and how it went. I think it would be fun to offer this course again, assuming our evaluations pass muster. But in the meantime, the class is still on, so it’s a bit hard to pass retrospective judgement. Detection and Estimation: book recommendations? It’s confirmed that I will be teaching Detection and Estimation next semester so I figured I would use the blog to conjure up some book recommendations (or even debate, if I can be so hopeful). Some of the contenders: • Steven M. Kay, Fundamentals of Statistical Signal Processing – Estimation Theory (Vol. 1), Prentice Hall, 1993. • H. Vincent Poor, An Introduction to Signal Detection and Estimation, 2nd Edition, Springer, 1998. • Harry L. Van Trees, Detection, Estimation, and Modulation Theory (in 4 parts), Wiley, 2001 (a reprint). • M.D. Srinath, P.K. Rajasekaran, P. K. and R. Viswanathan, Introduction to Statistical Signal Processing with Applications, Prentice Hall, 1996. Detection and estimation is a fundamental class for the ECE graduate curriculum, but these “standard” textbooks are around 20 years old, and I can’t help but think there might be more “modern” take on the subject (no I’m not volunteering). Venu Veeravalli‘s class doesn’t use a book, but just has notes. However, I think the students at Rutgers (majority MS students) would benefit from a textbook, at least as a grounding. Srinath et al. is what my colleague Narayan Mandyam uses. Kay is what I was leaning to before (because it seems to be the most widely used), but Poor’s book is the one I read. I think I am putting up the Van Trees as a joke, mostly. I mean, it’s a great book but I think a bit much for a textbook. So what do the rest of you use? Also, if you are teaching this course next semester, perhaps we can share some ideas. I think the curriculum might be ripe for some shaking up. If not in core material, at least in the kinds of examples we use. For example, I’m certainly going to cover differential privacy as a connection to hypothesis testing. Teaching bleg: articles on “data” suitable for first-year undergraduates My colleague Waheed Bajwa and I are teaching a Rutgers Byrne Seminar for first-year undergraduates this fall. The title of the course is Data: What is it Good For? (Absolutely Something), a reference which I am sure will be completely lost on the undergrads. The point of the course is to talk about “data” (what is it, exactly?), how it gets turned into “information,” and then perhaps even “knowledge,” with all of the pitfalls along the way. So it’s a good opportunity to talk about philosophy (e.g. epistemology), mathematics/statistics (e.g. undersampling, bias, analysis), engineering (e.g. storage, transmission), science (e.g. reduplication, retraction), and policy (e.g. privacy). It’s supposed to be a seminar class with lots of discussion, and the students can be expected to do a little reading outside of class. We have a full roster of 20 signed up, so managing the discussion might be a bit tricky, of course. We’re in the process of collecting reading materials — magazine articles, book chapters, blog posts, etc. for the students to read. We explicitly didn’t want it to be for “technical” students only. Do any readers of the blog have great articles suitable for first-year undergrads across all majors? As the class progresses I will post materials here, as well as some snapshot of the discussion. It’s my first time teaching a class of this type (or indeed any undergraduates at Rutgers) so I’m excited (and perhaps a bit nervous). On a side note, Edwin Starr’s shirt is awesome and I want one. Teaching technical (re-)writing I think it would be great to have a more formal way of teaching technical writing for graduate students in engineering. It’s certainly not being taught at (most) undergraduate institutions, and the mistakes are so common across the examples that I’ve seen that there must be a way to formalize the process for students. Since we tend to publish smaller things a lot earlier in our graduate career, having a “checklist” approach to writing/editing could be very helpful to first-time authors. There are several coupled problems here: • students often don’t have a clear line of thought before they write, • they don’t think of who their audience is, • they don’t know how to rewrite, or indeed how important it is. Adding to all of this is that they don’t know how to read a paper. In particular, they don’t know what to be reading for in terms of content or form. This makes the experience of reading “related work” sections incredibly frustrating. What I was thinking was a class where students learn to write a literature review (a small one) on a topic of their choosing. The first part will be how to read papers and make connections between them. What is the point of a literature review, anyway? The first objective is to develop a more systematic way of reading and processing papers. I think everyone I know professionally, myself included, learned how to do this in an ad-hoc way. I believe that developing a formula would help improve my own literature surveying. The second part of the course would be teaching about rewriting (rather than writing). That is, instead of providing rules like “don’t use the passive voice so much” we could focus on “how to revise your sentences to be more active.” I would also benefit from a systematic approach to this for my own writing. I was thinking of a kind of once-a-week writing seminar style class. Has anyone seen a class like this in engineering programs? Are there tips/tricks from other fields/departments which do have such classes that could be useful in such a class? Even though it is “for social scientists”, Harold Becker’s book is a really great resource. “Cascading Style Sheets are a cryptic language developed by the Freemasons to obscure the visual nature of reality” Via Cynthia, here is a column by James Mickens about how horrible the web is right now: Computer scientists often look at Web pages in the same way that my friend looked at farms. People think that Web browsers are elegant computation platforms, and Web pages are light, fluffy things that you can edit in Notepad as you trade ironic comments with your friends in the coffee shop. Nothing could be further from the truth. A modern Web page is a catastrophe. It’s like a scene from one of those apocalyptic medieval paintings that depicts what would happen if Galactus arrived: people are tumbling into fiery crevasses and lamenting various lamentable things and hanging from playground equipment that would not pass OSHA safety checks. It’s a fun read, but also a sentiment that may echo with those who truly believe in “clean slate networking.” I remember going to a tutorial on LTE and having a vision of what 6G systems will look like. One thing that is not present, though, is the sense that the system is unstable, and that the introduction of another feature in communication systems will cause the house of cards to collapse. Mickens seems to think the web is nearly there. The reason I thought of this is the recent fracas over the US ceding control of ICANN, and the sort of doomsdaying around that. From my perspective, network operators are sufficiently conservative that they can’t/won’t willy-nilly introduce new features that are only half-supported, like the in Web. The result is a (relatively) stable networking world that appears to detractors as somewhat Jurassic. I’d argue (with less hyperbole) that some of our curriculum ideas also suffer from the accretion of old ideas. When I took DSP oh-so-long ago (13 years, really?) we learned all of this Direct Form Transposed II blah blah which I’m sure was useful for DSP engineers at TI to know at some point, but has no place in a curriculum now. And yet I imagine there are many places that still teaching it. If anyone reads this still, what are the dinosaurs in your curriculum? A proposal for restructuring tenure An Op-Ed from the NY Times (warning: paywall) suggests creating research and teaching tenure tracks and hire people for one or the other. This is an interesting proposal, and while the author Adam Grant marshals empirical evidence showing that the two skills are largely uncorrelated, as well as research on designing incentives, it seems that the social and economic barriers to implementing such a scheme are quite high. Firstly, the economic. Grant-funded research faculty bring in big bucks (sometimes more modest bucks for pen-and-paper types) to the university. They overheads (55% at Rutgers, I think) on those grants help keep the university afloat, especially at places which don’t have huge endowments. Research in technology areas can also generate patents, startups, and other vehicles that bring money to the university coffers. This is an incentive for the university to push the research agenda first. Grant funding may be drying up, but it’s still a big money maker. On the social barriers, it’s simply true in the US that as a society we don’t value teaching very highly. Sure, we complain about the quality of education and its price and so on, but the taxpayers and politicians are not willing to put their money where their mouth is. We see this in the low pay for K-12 teachers and the rise of the$5k-per-class adjunct at the university level. If a university finds that it’s doing well on research but poorly on teaching, the solution-on-the-cheap is to hire more adjuncts.

Of course, the proposal also represents a change, and institutionalized professionals hate change. For what it’s worth, I think it’s a good idea to have more tenure-track teaching positions. However, forcing a choice — research or teaching — is a terrible idea. I do like research, but part of the reason I want to be at a university is to engage with students through the classroom. I may not be the best teacher now, but I want to get better. A better, and more feasible, short-term solution would be to create more opportunities and support for teacher development within the university. This would strengthen the correlation between research and teaching success.

Toolkit revisited

I joined TTI Chicago almost a year ago, and it’s been an interesting time here. Since my background is a bit different from most of the other folks here, I have many moments of “academic cognitive dissonance” as it were — but more on that later. Madhur Tulsiani is going to offer a toolkit course in the spring focusing on mathematical tools for CS theory — I wanted to revisit a topic from a few years ago, namely what an EE-systems/theory “toolkit” would look like. I think a similar course / seminar would be really handy (even for self-study), but the topics we came up with before seem a little dated now. It seems like the topics fall under a few categories

• advanced stochastic processes : stochastic approximation
• mathematical economics : game theory, auctions, mechanism design
• advanced probability : concentration of measure, random graphs
• optimization : stochastic control, dynamic programming, convex optimization
• mathematical statistics : asymptotic statistics, minimax theory

Roy’s observation is that these topics are already covered in graduate syllabi is still apt. But I still think that knowing a smattering of these topics is still important for general literacy and critical reading of papers. In reading a new paper I first situate the techniques within the context of things I know about — if I have to absorb the author’s cursory description of the general method as well as its application to the problem at hand, I get bogged down in the former and find the latter mystifying.

Actually, I think what would be great is to make tutorials on the topics and gather them together. I know that people who make research tutorials spend a lot of time on them and there’s some reluctance to gather them together, but these topics are not bleeding edge and could be part of a course. It’s sort of like Connexions, but perhaps a little less wiki-like and more lecture-notes like. What would be the best way to do that?

As an aside, Madhur is also thinking of doing a more focused course later which would cover coding and information theory for (theoretical) computer scientists. I’ve thought a fair bit about such a course focused on machine learning — focusing a bit more on statistical issues like redundancy and Sanov’s theorem instead of Gaussian channels. But how could one do an information theory course without $\frac{1}{2} \log( 1 + \mathsf{SNR} )$?