Postdoc at ASU/Harvard on ML and Privacy

A joint postdoc position at Arizona State University and Harvard University in the area of machine learning and privacy is available immediately. The successful candidate will be working with the research groups of Prof. Lalitha Sankar and Prof. Flavio du Pin Calmon.

Specific topics of focus are the interplay of machine learning and privacy with focus on both rigorous information-theoretical results as well as practical design aspects.

The appointment will be for a period of 12 months initially, with a possibility for renewal. We are looking for strong applicants with an excellent theoretical background and a proven capacity for high-quality research in form of a strong publication record. Knowledge of privacy literature is desirable.

Interested applicants should submit a current CV, a 1-page research statement and a list of three references. Candidates should contact us via email at lsankar@asu.edu and/or flavio@seas.harvard.edu.

Advertisements

Register soon for the 2017 North American School of Information Theory

The site for the 2017 North American School of Information Theory is now live and registration will begin next week. The IT Schools have been going strong for the last few years and are a great resource for students, especially new students, to get some exposure to information theory research beyond their own work and what they learned in class. Most schools do not have several people working on information theory. For students at such institutions the school provides a great way to meet other new researchers in the field.

IHP “Nexus” Workshop on Privacy and Security: Days 4-5

Wrapping this up, finally. Maybe conference blogging has to go by the wayside for me… my notes got a bit sketchier so I’ll just do a short rundown of the topics.

Days 4-5 were a series of “short” talks by Moni Naor, Kobbi Nissim, Lalitha Sankar, Sewoong Oh, Delaram Kahrobaie, Joerg Kliewer, Jon Ullman, and Sasho Nikolov on a rather eclectic mix of topics.

Moni’s talk was on secret sharing in an online setting — parties arrive one by one and the qualified sets (who can decode the secret) is revealed by all parties. The shares have to be generated online as well. Since the access structure is evolving, what kinds of systems can we support? As I understood it, the idea is to use something similar to threshold scheme and a “doubling trick”-like argument by dividing the users/parties into generations. It’s a bit out of area for me so I had a hard time keeping up with the connections to other problems. Kobbi talked about reconstruction attacks based on observing traffic from outsourced database systems. A user wants to get the records but the server shouldn’t be able to reconstruct: it knows how many records were returned from a query and knows if the same record was sent on subsequent queries — this is a sort of access pattern leakage. He presented attacks based on this information and also based on just knowing the volume (e.g. total size of response) from the queries.

Lalitha talked about mutual information privacy, which was quite a bit different than the differential privacy models from the CS side, but more in line with Ye Wang’s talk earlier in the week. Although she didn’t get to spend as much time on it, the work on interactive communication and privacy might have been interesting to folks earlier in the workshop studying communication complexity. In general, the connection between communication complexity problems and MPC, for example, are elusive to me (probably from lack of trying).

Sewoong talked about optimal mechanisms for differentially private composition — I had to miss his talk, unfortunately. Delaram talked about cryptosystems based on group theory and I had to try and check back in all the things I learned in 18.701/702 and the graduate algebra class I (mistakenly) took my first year of graduate school. I am not sure I could even do justice to it, but I took a lot of notes. Joerg talked about using polar codes to enable private function computation — initially privacy was measured by equivocation but towards the end he made a connection to differential privacy. Since most folks (myself included) are not experts on polar codes, he gave a rather nice tutorial (I thought) on polar coding. It being the last day of the workshop, the audience had unfortunately thinned out a bit.

Jon spoke about estimating marginal distributions for high-dimensional problems. There were some nice connections to composite hypothesis testing problems that came out of the discussion during the talk — the model seems a bit complex to get into based on my notes, but I think readers who are experts on hypothesis testing might want to check out his work. Sasho rounded off the workshop with a talk about the sensitivity polytope of linear queries on a statistical database and connections to Gaussian widths. The main result was on the sample complexity of answering the queries in time polynomial in the number of individuals, size of the universe, and size of the query set.

IHP “Nexus” Workshop on Privacy and Security: Day 2

Verrrrrry belated blogging on the rest of the workshop, more than a month later. Day 2 had 5 talks instead of the tutorial plus talks, and the topics were a bit more varied (this was partly because of scheduling issues that prevented us from being strictly thematic).

Amos Beimel started out with a talk on secret sharing, which had a very nice tutorial/introduction to the problem, including the connection between Reed-Solomon codes and Shamir’s t-out-of-n scheme. For professional (and perhaps personal) reasons I found myself wondering how much more the connection between secret sharing and coding theory was — after all, this was a workshop about communication between information theory and theoretical CS. Not being a coding theory expert myself, I could only speculate. What I didn’t know about was the more general secret sharing structures and the results of Ito-Saito-Nishizeki scheme (published in Globecom!). Amos also talked about monotone span programs, which were new to me, and how to prove lower bounds. He concluded with more recent work on the related distribution design problem: how can we construct a distribution on n variables given constraints that specify subsets which should have identical marginals and subsets which should have disjoint support? The results appeared in ICTS.

Ye Wang talked about his work on common information and how it appears in privacy and security problems from an information theoretic perspective. In particular he talked about secure sampling, multiparty computation, and data release problems. The MPC and sampling results were pretty technical in terms of notions of completeness of primitives (conditional distributions) and triviality (a way of categorizing sources). For the data release problem he focused on problems where a sanitizer has access to a pair (X,Y) where X is private and Y is “useful” — the goal is to produce a version of the data which reveals less about X (privacy) and more about Y (utility). Since they are correlated, there is a tension. The question he addressed is when having access to Y alone as as good as both X and Y.

Manoj, after giving his part of the tutorial (and covering for Vinod), gave his own talk on what he called “cryptographic complexity,” which is an analogy to computational complexity, but for multiparty functions. This was also a talk about definitions and reductions: if you can build a protocol for securely computing f(\cdot) using a protocol for g(\cdot), then f(\cdot) reduces to g(\cdot). A complete function is one for which everything reduces to it, and a trivial function reduces to everything. So with the concepts you can start to classify and partition out functions like characterizing all complete functions for 2 parties, or finding trivial functions under different security notions. He presented some weird facts, like an n bit XOR doesn’t reduce to an (n-1) bit XOR. It was a pretty interesting talk, and I learned quite a bit!

Elette Boyle gave a great talk on Oblivious RAM, a topic about which I was completely oblivious myself. The basic idea in oblivious RAM is (as I understood it) that an adversary can observe the accesses to a RAM and therefore infer what program is being executed (and the input). To obfuscate that, you introduce a bunch of spurious accesses. So if you have a program $\latex \Pi$ whose access pattern is fixed prior to execution, you can randomize the accesses and gain some security. The overhead is the ratio of the total accesses to the required accesses. After this introduction to the problem, she talked about lower bounds on the overhead (e.g. you need this much overhead) for a case where you have parallel processing. I admit that I didn’t quite understand the arguments, but the problem was pretty interesting.

Hoeteck Wee gave the last (but quite energetic) talk of the afternoon, on what he called “functional encryption.” The ideas is that Alice has (x,M) and Bob has y. They both send messages to a third party, Charlie. There is a 0-1 function (predicate) P(x,y) such that if P(x,y) = 1 then Charlie can decode the message M. Otherwise, they cannot. An example would be the predicate P(x,y) = \mathbf{1}(x = y). In this case, Alice can send h(x) \oplus M and Bob can send h(y) for some 2-wise independent hash function, and then Charlie can recover M if the hashes match. I think there is a question in this scheme about whether Charlie needs to know that they got the right message, but I guess I can read the paper for that. The kinds of questions they want to ask are what kinds of predicates have nice encoding schemes? What is the size of message that Alice and Bob have to send? He made a connection/reduction to a communication complexity problem to get a bound on the message sizes in terms of the communication complexity of computing the predicate P. It really was a very nice talk and pretty understandable even with my own limited background.

Bob Gallager on Shannon’s tips for research

One of the classes I enjoyed the most in undergrad was Bob Gallager’s digital communications class, 6.450. I was reminded of what an engaging lecturer he was yesterday when I attended the Bell Labs Shannon Celebration yesterday. Unfortunately, it being the last week of the semester, I could not attend today’s more technical talks. Gallager gave a nice concise summary of what he learned from Shannon about how to do good theory work:

  1. Simplify the problem
  2. Relate it to other problems
  3. Restate the problem in as many ways as possible
  4. Break the problem into pieces
  5. Avoid getting locked into thinking ruts
  6. Generalize

As he said, “it’s a process of doing research… each one [step] gives you a little insight.” It’s tempting, as a theorist, to claim that at the end of this process you’ve solved the “fundamental” problem, but Gallager admonished us to remember that the first step is to simplify, often dramatically. As Alfred North Whitehead said, we should “seek simplicity and distrust it.”

Multiple Postdoc Openings at USC

Prof. Urbashi Mitra is looking for multiple postdocs. Given that this is the time of year when the future looks murkiest, these are great opportunities!

I am seeking multiple post-doctoral researchers are sought with expertise in one or more areas: Communication Theory, (Statistical) Signal Processing, Controls, Information Theory, and Machine Learning. In particular, the following expertises are of interest: structured inference (sparse approximation, low rank matrix completion, tensor signal processing, graph signal processing); multi-terminal information theory, or information theory at the boundaries of control or signal processing; distributed control, consensus methods and partially observable Markov Decision Process modeling and algorithms; modern optimization methods; or biological communications, signal processing or information theory.

The successful applicants will be expected to perform innovative translational research, mentor PhD students, give oral presentations, write journal papers, and participate in grant writing and project management. There will be significant opportunities for research leadership and interaction with funding agencies.

Ideally, the successful applicants will start in Summer 2016.

Please have your interested graduate students apply using the following portal:

https://jobs.usc.edu/postings/63539

In addition to a cv and research statement, the applicants are requested to have three letters of reference uploaded to the system as well.

Jan Hein van Dierendonck’s Painting of Shannon

Jan Hein van Dierendonck, a science writer and illustrator/cartoonist from Leiden, recently contacted the IT Society about an oil painting he made of Claude Shannon. He has kindly given permission to post it here. It will be used by some of the Shannon Centenary events this year.

Claude Shannon, by Jan Hein van Dierendonck

Claude Shannon, by Jan Hein van Dierendonck

Claude Elwood Shannon (April 30, 1916 – February 24, 2001)

In the Forties a juggling Claude Elwood Shannon rides a unicycle down the endless hallways of Bell Labs, a telecommunications research laboratory south of New York. Perhaps this balancing act puts his brilliant mind in the right state to look at complex problems in an original way and to devise the formulas that initiate the Digital Era.

As a 21-year-old master’s degree student at the Massachusetts Institute of Technology, Shannon wrote his thesis demonstrating that electrical applications of Boolean algebra could construct and resolve any logical, numerical relationship. In 1948 this mathematician, electronic engineer, and cryptographer published a landmark paper that laid the foundation for information theory. From that moment on, information is something computable. Whether you are dealing with images, text or sound: convert everything into zeros and ones and remove all redundant information and noise. This has changed our world completely. Without Shannon’s Information Theory, your phone simply wasn’t smart.

Averse to fame, the professor in electronics preferred tinkering with his amazing magnetic mouse in a maze with memory and his mechanic juggling robots. He also refined his Juggling Theorem: the number of hands (H) multiplied by the total time a ball spends in the air (F) and is held in a hand (D) is in balance with the number of balls (N) multiplied by the total time a hand is empty (V) and holding a ball (D).

On April 30, 2016, he would have been a hundred.