Data: what is it good for? (Absolutely Something): the first few weeks

So Waheed Bajwa and I have been teaching this Byrne Seminar on “data science.” At Allerton some people asked me how it was going and what we were covering in the class. These seminars are meant to be more discussion-based. This is a bit tough for us in particular:

  • engineering classes are generally NOT discussion-based, neither in the US nor in Pakistan
  • it’s been more than a decade since we were undergraduates, let alone 18
  • the students in our class are fresh out of high school and also haven’t had discussion-based classes

My one experience in leading discussion was covering for a theater class approximately 10 years ago, but that was junior-level elective as I recall, and the dynamics were quite a bit different. So getting a discussion going and getting all of the students to participate is, on top of being tough in general, particularly challenging for us. What has helped is that a number of the students in the class are pretty engaged with the ideas and material, and we do in the end get to collectively think about the technologies around us and the role that data plays a bit differently.

What I wanted to talk about in this post was what we’ve covered in the first few weeks. If we offer this class again it would be good to revisit some of the decisions we’ve made along the way, as this is as much a learning process for us as it is for them. A Byrne Seminar meets for 10 times during the semester, so that it will end well before finals. We had some overflow from one topic to the next, but roughly speaking the class went in the following order:

  • Introduction: what is data?
  • Potentials and perils of data science
  • The importance of modeling
  • Statistical considerations
  • Machine learning and algorithms
  • Data and society: ethics and privacy
  • Data visualizaion
  • Project Presentations

I’ll talk a bit more on the blog about this class, what we covered, what readings/videos we ended up choosing, and how it went. I think it would be fun to offer this course again, assuming our evaluations pass muster. But in the meantime, the class is still on, so it’s a bit hard to pass retrospective judgement.

CFP: JSTSP Special Issue on Privacy

IEEE Signal Processing Society
IEEE Journal of Selected Topics in Signal Processing
Special Issue on Signal and Information Processing for Privacy

Aims and Scope: There has been a remarkable increase in the usage of communications and information technology over the past decade. Currently, in the backend and in the cloud, reside electronic repositories that contain an enormous amount of information and data associated with the world around us. These repositories include databases for data-mining, census, social networking, medical records, etc. It is easy to forecast that our society will become increasingly reliant on applications built upon these data repositories. Unfortunately, the rate of technological advancement associated with building applications that produce and use such data has significantly outpaced the development of mechanisms that ensure the privacy of such data and the systems that process it. As a society we are currently witnessing many privacy-related concerns that have resulted from these technologies—there are now grave concerns about our communications being wiretapped, about our SSL/TLS connections being compromised, about our personal data being shared with entities we have no relationship with, etc. The problems of information exchange, interaction, and access lend themselves to fundamental information processing abstractions and theoretical analysis. The tools of rate-distortion theory, distributed compression algorithms, distributed storage codes, machine learning for feature identification and suppression, and compressive sensing and sampling theory are fundamental and can be applied to precisely formulate and quantify the tradeoff between utility and privacy in a variety of domains. Thus, while rate-distortion theory and information-theoretic privacy can provide fundamental bounds on privacy leakage of distributed data systems, the information and signal processing techniques of compressive sensing, machine learning, and graphical models are the key ingredients necessary to achieve these performance limits in a variety of applications involving streaming data, distributed data storage (cloud), and interactive data applications across a number of platforms. This special issue seeks to provide a venue for ongoing research in information and signal processing for applications where privacy concerns are paramount.

Topics of Interest include (but are not limited to):

  • Signal processing for information-theoretic privacy
  • Signal processing techniques for access control with privacy guarantees in distributed storage systems
  • Distributed inference and estimation with privacy guarantees
  • Location privacy and obfuscation of mobile device positioning
  • Interplay of privacy and other information processing tasks
  • Formalized models for adversaries and threats in applications where consumer and producer privacy is a major concern
  • Techniques to achieve covert or stealthy communication in support of private communications
  • Competitive privacy and game theoretic formulations of privacy and obfuscation

Important Dates:
Manuscript submission due: October 1, 2014
First review completed: December 15, 2014
Revised manuscript due: February 1, 2015
Second review completed: March 15, 2015
Final manuscript due: May 1, 2015
Publication date: October 2015

Prospective authors should visit the JSTSP homepage for information on paper submission. Manuscripts should be submitted using Manuscript Central.

Linkage

I am traveling all over India at the moment so I’m not really able to write contentful posts. Here are even more links instead, sigh. Maybe later I’ll talk about log-Sobolev inequalities so I can be cool like Max.

Speaking of Max, he posted this hilarious bad lip reading version of Game of Thrones. Probably NSFW. I don’t even like the series but it’s pretty funny.

For those who are fans of Rejected, Don Hertzfeldt’s new film is available on Vimeo.

Those who were at Berkeley may remember seeing Ed Reed perform at the Cheeseboard. His album (which I helped fund via indiegogo, was named a Downbeat Editors’ Pick. It’s a great album.

In light of the Snowden leaks, some doubt has been cast on NIST’s crypto standards.

I’m super late to this, but I endorse Andrew’s endorsement of Sergio‘s interview with Robert Fano in the IT Newsletter. Here’s just the article, if you want that.

Linkage

David McAllester, my department chair at TTI, has a started a new blog.

I thought it was pretty well known that people are fairly unique by ZIP code, but Forbes has an article about it now (h/t Raj). Of course, stores can also ping a smartphone’s WiFi to get more accurate location information about your activity within the store — when you check out they can tag your the MAC address of your device to all the other information about you. Creeptastic!

Bradley Efron’s perspective on the impact of Bayes’ Theorem from Science (h/t Kevin).

Some discussion on what makes a popular philosophy book. I wonder what, if anything, transfers over to a popular mathematical book?

Some thoughts from Larry Laudan on the mathematization of the presumption of innocence.

CFP : GlobalSIP 2013 Deadline Extended to June 15

I’m on the program committee for the Cyber-Security and Privacy symposium, so I figured I would post this here to make more work for myself.

GlobalSIP 2013 – Call for Papers
IEEE Global Conference on Signal and Information Processing
December 3-5, 2013 | Austin, Texas, U.S.A.

GlobalSIP: IEEE Global Conference on Signal and Information Processing is a new flagship IEEE Signal Processing Society conference. The focus of this conference is on signal and information processing and up-and-coming signal processing themes.

GlobalSIP is composed of symposia selected based on responses to the call-for-symposia proposals. GlobalSIP is composed of symposia on hot topics related to signal and information processing.

The selected symposia are:

Paper submission will be online only through the GlobalSIP 2013 website Papers should be in IEEE two-column format. The maximum length varies among the symposia; be sure to check each symposium’s information page for details. Authors of Signal Processing Letters papers will be given the opportunity to present their work at GlobalSIP 2013, subject to space availability and approval by the Technical Program Chairs of GlobalSIP 2013. The authors need to specify in which symposium they wish to present their paper. Please check conference webpage for details.

Important Dates:
*New* Paper Submission Deadline – June 15, 2013
Review Results Announce – July 30, 2013
Camera-Ready Papers Due – September 7, 2013
*New* SPL request for presentation – September 7, 2013

Linkage : science edition

Learning from transcriptomes can be cheaper for organisms which have never been sequenced.

A fancy Nature article on mobility privacy, in case you weren’t convinced by other studies on mobility privacy.

Bad statistics in neuroscience. Color me unsurprised.

I bet faked results happen a lot in pharmaceutical trials, given the money involved. Perhaps we should jail people for faking data as a disincentive?

The Atheist shoe company did a study to see if the USPS was discriminating against them.

Readings

Endless Things [John Crowley] — Book four of the Aegypt Cycle, and the one most grounded in the present. The book moves more swiftly than the others, as if Crowley was racing to the end. Many of the concerns of the previous books, such as magic, history, and memory, are muted as the protagonist Pierce Moffett wends his way through his emotional an intellectual turmoil and into what in the end amounts to a kind of peace. Obviously only worth reading if you read the first three books.

Understanding Privacy [Daniel Solove] — A law professor’s take on what constitutes privacy. Solove wants to conceptualize privacy in terms of clusters of related ideas rather than take a single definition, and he tries to put a headier philosophical spin on it by invoking Wittgenstein. I found the book a bit overwritten but it does parse out the things we call privacy, especially in the longest chapter on the taxonomy of privacy. It’s not a very long book, but it has a number of good examples and also case law to show how muddled our legal definitions have become. He also makes a strong case for increased protections and shows how the law is blind to the effects of information aggregation, for example.

The Fall of the Stone City [Ismail Kadare] — An allegorical novel by a Man Booker prize winner chronicling the Nazi occupation and the communist takeover of Gjirokaster, an old Albanian city. It’s a dark absurdist comedy, partly in the vein of Kafka but with a bit of… Calvino almost. The tone of the book (probably a testament to the translator) has this almost academic detachment, gently mocking as it describes the ways in which the victors try to rewrite history.

Invisible Men [Becky Pettit] — A sobering look at how mass incarceration interacts with official statistics. Because most surveys are household-based, they do not count the increasingly larger incarcerated population, thereby introducing a systematic racialized bias in the statistics used for public policy. In particular, Pettit shows how this bias leads to underestimation of racial inequity because the (mainly young black male) prisoners are “erased” in the official records.

The Rise of Ransom City [Felix Gilman] — A sequel to The Half-Made World, and a wondrously engrossing read it is too, filled with the clash of ideas, the corruption of corporations, the “borrowing” and evolution of ideas, and the ravages of industrialization. Also has a healthy dose of Mark Twain for good measure.