IHP “Nexus” Workshop on Privacy and Security: Days 4-5

Wrapping this up, finally. Maybe conference blogging has to go by the wayside for me… my notes got a bit sketchier so I’ll just do a short rundown of the topics.

Days 4-5 were a series of “short” talks by Moni Naor, Kobbi Nissim, Lalitha Sankar, Sewoong Oh, Delaram Kahrobaie, Joerg Kliewer, Jon Ullman, and Sasho Nikolov on a rather eclectic mix of topics.

Moni’s talk was on secret sharing in an online setting — parties arrive one by one and the qualified sets (who can decode the secret) is revealed by all parties. The shares have to be generated online as well. Since the access structure is evolving, what kinds of systems can we support? As I understood it, the idea is to use something similar to threshold scheme and a “doubling trick”-like argument by dividing the users/parties into generations. It’s a bit out of area for me so I had a hard time keeping up with the connections to other problems. Kobbi talked about reconstruction attacks based on observing traffic from outsourced database systems. A user wants to get the records but the server shouldn’t be able to reconstruct: it knows how many records were returned from a query and knows if the same record was sent on subsequent queries — this is a sort of access pattern leakage. He presented attacks based on this information and also based on just knowing the volume (e.g. total size of response) from the queries.

Lalitha talked about mutual information privacy, which was quite a bit different than the differential privacy models from the CS side, but more in line with Ye Wang’s talk earlier in the week. Although she didn’t get to spend as much time on it, the work on interactive communication and privacy might have been interesting to folks earlier in the workshop studying communication complexity. In general, the connection between communication complexity problems and MPC, for example, are elusive to me (probably from lack of trying).

Sewoong talked about optimal mechanisms for differentially private composition — I had to miss his talk, unfortunately. Delaram talked about cryptosystems based on group theory and I had to try and check back in all the things I learned in 18.701/702 and the graduate algebra class I (mistakenly) took my first year of graduate school. I am not sure I could even do justice to it, but I took a lot of notes. Joerg talked about using polar codes to enable private function computation — initially privacy was measured by equivocation but towards the end he made a connection to differential privacy. Since most folks (myself included) are not experts on polar codes, he gave a rather nice tutorial (I thought) on polar coding. It being the last day of the workshop, the audience had unfortunately thinned out a bit.

Jon spoke about estimating marginal distributions for high-dimensional problems. There were some nice connections to composite hypothesis testing problems that came out of the discussion during the talk — the model seems a bit complex to get into based on my notes, but I think readers who are experts on hypothesis testing might want to check out his work. Sasho rounded off the workshop with a talk about the sensitivity polytope of linear queries on a statistical database and connections to Gaussian widths. The main result was on the sample complexity of answering the queries in time polynomial in the number of individuals, size of the universe, and size of the query set.

CFP: IEEE T-SIPN Special Issue on Distributed Information Processing in Social Networks

IEEE Signal Processing Society
IEEE Transactions on Signal and Information Processing over Networks
Special Issue on Distributed Information Processing in Social Networks

Over the past few decades, online social networks such as Facebook and Twitter have significantly changed the way people communicate and share information with each other. The opinion and behavior of each individual are heavily influenced through interacting with others. These local interactions lead to many interesting collective phenomena such as herding, consensus, and rumor spreading. At the same time, there is always the danger of mob mentality of following crowds, celebrities, or gurus who might provide misleading or even malicious information. Many efforts have been devoted to investigating the collective behavior in the context of various network topologies and the robustness of social networks in the presence of malicious threats. On the other hand, activities in social networks (clicks, searches, transactions, posts, and tweets) generate a massive amount of decentralized data, which is not only big in size but also complex in terms of its structure. Processing these data requires significant advances in accurate mathematical modeling and computationally efficient algorithm design. Many modern technological systems such as wireless sensor and robot networks are virtually the same as social networks in the sense that the nodes in both networks carry disparate information and communicate with constraints. Thus, investigating social networks will bring insightful principles on the system and algorithmic designs of many engineering networks. An example of such is the implementation of consensus algorithms for coordination and control in robot networks. Additionally, more and more research projects nowadays are data-driven. Social networks are natural sources of massive and diverse big data, which present unique opportunities and challenges to further develop theoretical data processing toolsets and investigate novel applications. This special issue aims to focus on addressing distributed information (signal, data, etc.) processing problems in social networks and also invites submissions from all other related disciplines to present comprehensive and diverse perspectives. Topics of interest include, but are not limited to:

  • Dynamic social networks: time varying network topology, edge weights, etc.
  • Social learning, distributed decision-making, estimation, and filtering
  • Consensus and coordination in multi-agent networks
  • Modeling and inference for information diffusion and rumor spreading
  • Multi-layered social networks where social interactions take place at different scales or modalities
  • Resource allocation, optimization, and control in multi-agent networks
  • Modeling and strategic considerations for malicious behavior in networks
  • Social media computing and networking
  • Data mining, machine learning, and statistical inference frameworks and algorithms for handling big data from social networks
  • Data-driven applications: attribution models for marketing and advertising, trend prediction, recommendation systems, crowdsourcing, etc.
  • Other topics associated with social networks: graphical modeling, trust, privacy, engineering applications, etc.

Important Dates:

  • Manuscript submission due: September 15, 2016
  • First review completed: November 1, 2016
  • Revised manuscript due: December 15, 2016
  • Second review completed: February 1, 2017
  • Final manuscript due: March 15, 2017
  • Publication: June 1, 2017

Guest Editors:

tracks: response the sky’s convulsion

Old and new for a rainy day.

  1. If You Find Yourself Caught In LoveBelle and Sebastian
  2. Something About YouLucius
  3. The Natural WorldCymbals
  4. GutsThao & The Get Down Stay Down
  5. New Old FriendsJack & Jeffrey Lewis
  6. PatriarchaethGwenno
  7. Bright Shiny MorningMe’Shell NdegeOcello
  8. PowerEskimeaux
  9. ClayManatee Commune (feat. Marina Price)
  10. Little Drop Of PoisonTom Waits
  11. What Would I Do Without YouRay Charles
  12. Should Have Known BetterSufjan Stevens
  13. Survive ItGhostpoet
  14. I Think I Need A New HeartThe Magnetic Fields

IHP “Nexus” Workshop on Privacy and Security: Day 3

I’m doggedly completing these notes because in a fit of ambition I actually started posts for each of the workshop days and now I feel like I need to finish it up. Day 3 was a day of differential privacy: Adam Smith, Cynthia Dwork, and Kamalika Chaudhuri.

Adam gave a tutorial on differential privacy that had a bit of a different flavor from tutorials I have seen before (and given). He started out by highlighting a taxonomy of potential attacks on released data to make a distinction between re-identification, reconstruction, membership, and correlation inferences before going into the definitions, composition theory, Bayesian interpretation, and so on. With the attacks, he focused a bit more on the reconstruction story. The algorithms view of things (as I get it) is to think of, say, an LP relaxation of a combinatorial problem: you solve the LP and round the solution to integers and prove that it’s either correct or close to correct. This has more connections to things we think about in information theory (e.g. compressed sensing) but the way of stating the problem was a bit different. He also described the Homer et al. attack on GWAS. The last part of his talk was on multiplicative weights and algorithms for learning distributions over the data domain, which I think got a bit hairy for the IT folks who hadn’t seen MW before. This made me wonder if these connections between mirror descent on the simplex, information projections, and other topics can be taught in a “first principles” way that doesn’t require you to have a lot of familiarity with one interpretation of the method before bridging to another.

Cynthia gave a talk on false discovery control and how to use differential privacy ideas in a version of the Benjamini-Hochberg BHq procedure for controlling the false discovery rate. A key primitive is the the report noisy argmax procedure, which gives the index of the argmax but not its value (which would entail a further privacy loss). Since most people are not familiar with FDR control, she spent a lot of her talk on that and so the full details of the private version were deferred to the paper. I covered FDR in my detection and estimation class partly from some of the extra attention it has received in the privacy workshops over the last few years.

Kamalika’s talk was on a model for privacy when data may be correlated between individuals. This involves using the Pufferfish model for privacy in which there is an explicit class of probability distribution on parameters and a set of explicit secrets which the algorithm wants to obfuscate: the differential privacy guarantee should hold for the output distribution of the mechanism conditioned on any valid data distribution and any pair of secrets. Since the class of data distributions is arbitrary, we can also consider joint distributions on individuals’ data — if the distribution class has some structure, then there might be a hope to efficiently produce an output of a function. She talked about using the \ell_{\infty} Wasserstein distance to measure the sensitivity of a function, and that adding noise that scales with this sensitivity would guarantee privacy in the Pufferfish model. She then gave an example for Bayesian networks and Markov chains. As we discussed, it seems like for each dependence structure you need to come up with a sort of covering of the dependencies to add noise appropriately. This seems pretty challenging in general now, but maybe after a bit more work there will be a clearer “general” strategy to handle dependence along these lines.

Readings

Ancillary Sword (Ann Leckie) and Ancillary Mercy (Ann Leckie). These were the second two books in the Ancillary series, following the story of Justice of Toren, the last remnant of a ship AI, and her struggle to maintain order and do right by people. The last two felt a bit more feel-good than the first one, which had a more ambiguous arc, but I really enjoyed these books. Given how rough the semester was for me, it was nice to occasionally sink into a story.

Child of All Nations (Pramoedya Ananta Toer). A sequel to This Earth of Mankind and part of the Buru quartet, this novel follows the story of Minke, an young Indonesian man in the late 19th century who was educated in a Dutch-mediun school on Java. Minke, now a graduate, runs up against colonialism in its many ugly forms, from outright theft to the moderate incremental anti-colonialists. In this book we can see him struggle towards and understanding and of and connection to the cause of Natives on their own terms. He starts to see things from heir eyes, in particular the struggle of tenant farmers. I’m looking forward to reading the last two books!

Colline (Jean Giono). I picked this up on an impulse at the NYPL (it was in the new books section) since I generally learn a lot from reading the NYRB series of reprints — they are things I wouldn’t have known about otherwise. This is Giono’s first novel — his experience in WWI (he was at Verdun) affected him deeply, and apparently many of his books deal with the relationship between humans and nature. This is about a small community in Provence which experiences a series of mysterious and terrifying events — perhaps they have violated nature and are being punished, but perhaps they can root out the evil that curses them and kill it. Is all that we humans do? Killing and scything and scarring nature? Recommended if you like sketchily narrated books with lots of pastoral mysticism.

The Day of The Owl (Leonardo Sciascia). This is one of the first novels about the Mafia — at the time (the early 60s) people were debating whether the mafia really existed or not. The book follows the investigation of a murder by a zealous carabinieri, Inspector Bellodi. His investigation is hampered by intransigent witnesses and Sciascia keeps up a running commentary from Bellodi’s subordinates’ internal monologue to faceless individuals discussing the progress of his investigation. The tone is slightly humorous, despite the body count. The preface really helped contextualize the novel. Without having grown up in Italy I don’t think I would have understood the relationship between Sicily and the rest of Italy, the omnipresence of and complete silence about the Mafia, and the politics of mid-20th century Italy. Recommended for those who like historical detective novels (plus it’s a quick read).

The Tijuana Book of the Dear (Luis Alberto Urrea). I haven’t read poetry in ages so this was a welcome change of pace for me. Urrea’s collection, as the title suggests, is about the borderlands. It’s also about his becoming a poet — one poem describes him copying out poems on a battered typewriter, producing a “second rate Morrison. / a $4.95 Bukowski. / a $1.98 Wakoski.” Unsurprisingly, I liked the overtly political ones, like Arizona Lamentation, which says “This was always Odin’s garden / A clean white place,” and “No Mexican was ever born / In our land,” lamenting that

We had something grand here
We had family values, we had clean sidewalks.

Then these strangers came. These mudmen.
They invaded our dream

And colored it.

There are also some striking love poems. I really enjoyed the dark humor in Urrea’s poems. Maybe this will start a summer of poetry for me.

IHP “Nexus” Workshop on Privacy and Security: Day 2

Verrrrrry belated blogging on the rest of the workshop, more than a month later. Day 2 had 5 talks instead of the tutorial plus talks, and the topics were a bit more varied (this was partly because of scheduling issues that prevented us from being strictly thematic).

Amos Beimel started out with a talk on secret sharing, which had a very nice tutorial/introduction to the problem, including the connection between Reed-Solomon codes and Shamir’s t-out-of-n scheme. For professional (and perhaps personal) reasons I found myself wondering how much more the connection between secret sharing and coding theory was — after all, this was a workshop about communication between information theory and theoretical CS. Not being a coding theory expert myself, I could only speculate. What I didn’t know about was the more general secret sharing structures and the results of Ito-Saito-Nishizeki scheme (published in Globecom!). Amos also talked about monotone span programs, which were new to me, and how to prove lower bounds. He concluded with more recent work on the related distribution design problem: how can we construct a distribution on n variables given constraints that specify subsets which should have identical marginals and subsets which should have disjoint support? The results appeared in ICTS.

Ye Wang talked about his work on common information and how it appears in privacy and security problems from an information theoretic perspective. In particular he talked about secure sampling, multiparty computation, and data release problems. The MPC and sampling results were pretty technical in terms of notions of completeness of primitives (conditional distributions) and triviality (a way of categorizing sources). For the data release problem he focused on problems where a sanitizer has access to a pair (X,Y) where X is private and Y is “useful” — the goal is to produce a version of the data which reveals less about X (privacy) and more about Y (utility). Since they are correlated, there is a tension. The question he addressed is when having access to Y alone as as good as both X and Y.

Manoj, after giving his part of the tutorial (and covering for Vinod), gave his own talk on what he called “cryptographic complexity,” which is an analogy to computational complexity, but for multiparty functions. This was also a talk about definitions and reductions: if you can build a protocol for securely computing f(\cdot) using a protocol for g(\cdot), then f(\cdot) reduces to g(\cdot). A complete function is one for which everything reduces to it, and a trivial function reduces to everything. So with the concepts you can start to classify and partition out functions like characterizing all complete functions for 2 parties, or finding trivial functions under different security notions. He presented some weird facts, like an n bit XOR doesn’t reduce to an (n-1) bit XOR. It was a pretty interesting talk, and I learned quite a bit!

Elette Boyle gave a great talk on Oblivious RAM, a topic about which I was completely oblivious myself. The basic idea in oblivious RAM is (as I understood it) that an adversary can observe the accesses to a RAM and therefore infer what program is being executed (and the input). To obfuscate that, you introduce a bunch of spurious accesses. So if you have a program $\latex \Pi$ whose access pattern is fixed prior to execution, you can randomize the accesses and gain some security. The overhead is the ratio of the total accesses to the required accesses. After this introduction to the problem, she talked about lower bounds on the overhead (e.g. you need this much overhead) for a case where you have parallel processing. I admit that I didn’t quite understand the arguments, but the problem was pretty interesting.

Hoeteck Wee gave the last (but quite energetic) talk of the afternoon, on what he called “functional encryption.” The ideas is that Alice has (x,M) and Bob has y. They both send messages to a third party, Charlie. There is a 0-1 function (predicate) P(x,y) such that if P(x,y) = 1 then Charlie can decode the message M. Otherwise, they cannot. An example would be the predicate P(x,y) = \mathbf{1}(x = y). In this case, Alice can send h(x) \oplus M and Bob can send h(y) for some 2-wise independent hash function, and then Charlie can recover M if the hashes match. I think there is a question in this scheme about whether Charlie needs to know that they got the right message, but I guess I can read the paper for that. The kinds of questions they want to ask are what kinds of predicates have nice encoding schemes? What is the size of message that Alice and Bob have to send? He made a connection/reduction to a communication complexity problem to get a bound on the message sizes in terms of the communication complexity of computing the predicate P. It really was a very nice talk and pretty understandable even with my own limited background.

Bob Gallager on Shannon’s tips for research

One of the classes I enjoyed the most in undergrad was Bob Gallager’s digital communications class, 6.450. I was reminded of what an engaging lecturer he was yesterday when I attended the Bell Labs Shannon Celebration yesterday. Unfortunately, it being the last week of the semester, I could not attend today’s more technical talks. Gallager gave a nice concise summary of what he learned from Shannon about how to do good theory work:

  1. Simplify the problem
  2. Relate it to other problems
  3. Restate the problem in as many ways as possible
  4. Break the problem into pieces
  5. Avoid getting locked into thinking ruts
  6. Generalize

As he said, “it’s a process of doing research… each one [step] gives you a little insight.” It’s tempting, as a theorist, to claim that at the end of this process you’ve solved the “fundamental” problem, but Gallager admonished us to remember that the first step is to simplify, often dramatically. As Alfred North Whitehead said, we should “seek simplicity and distrust it.”