# IHP “Nexus” Workshop on Privacy and Security: Day 1

The view from my office at IHP

I am attending the Nexus of Information and Computation Theories workshop at the Institut Henri Poincaré in Paris this week. It’s the last week of a 10 week program that brought together researchers from information theory and CS theory in workshops around various themes such as distributed computation, inference, lower bounds, inequalities, and security/privacy. The main organizers were Bobak Nazer, Aslan Tchamkerten, Anup Rao, and Mark Braverman. The last two weeks are on Privacy and Security: I helped organize these two weeks with Prakash Narayan, Salil Vadhan, Aaron Roth, and Vinod Vaikuntanathan.

Due to teaching and ICASSP, I missed last week, but am here for this week, for which the sub-topics are security multiparty computation and differential privacy. I’ll try to blog about the workshop since I failed to blog at all about ITA, CISS, or ICASSP. The structure of the workshop was to have 4 tutorials (two per week) and then a set of hopefully related talks. The first week had tutorials on pseudorandomness and information theoretic secrecy.

The second week of the workshop kicked off with a tutorial from Yuval Ishai and Manoj Prabhakaran on secure multiparty computation (MPC). Yuval gave an abbreviated version/update of his tutorial from the Simons Institute (pt1/pt2) that set up the basic framework and language around MPC: $k$ parties with inputs $x_1, x_2, \ldots, x_k$ want to exchange messages to implement a functionality (evaluate a function) $f(x_1, x_2, \ldots, x_k)$ over secure point-to-point channels such they successfully learn the output of the function but don’t learn anything additional about each others’ inputs. There is a landscape of definitions within this general framework: some parties could collude, behave dishonestly with respect to the protocol, and so on. The guarantees could be exact (in the real/ideal paradigm in which you compare the real system with an simulated system), statistical (the distribution in the real system is close in total variation distance to an ideal evaluation), or computational (some notion of indistinguishability). The example became a bit clearer when he described a 2-party example with a “trusted dealer” who can give parties some correlated random bits and they could use those to randomly shift the truth table/evaluation of $f(x_1, x_2)$ to guarantee correctness and security.

Manoj, on the other hand talked about some notions of reductions between secure computations: given a protocol which evaluates $f$, can you simulate/compute $g$ using calls to $f$? How many do you need? this gives a notion of the complexity rate of one function in terms of another. For example, can Alice and Bob simulate a BEC using calls to an oblivious transfer (OT) protocol? What about vice versa? What about using a BSC? These problems seem sort of like toy channel problems (from an information theory perspective) but seem like fundamental building blocks when thinking about secure computation. As I discussed with Hoeteck Wee today, in information theory we often gain some intuition from continuous alphabets or large/general alphabet settings, whereas cryptography arguments/bounds come from considering circuit complexity: these are ideas that we don’t think about too much in IT since we don’t usually care about computational complexity/implementation.

Huijia (Rachel) Lin gave an introduction to zero-knowledge proofs and proof systems: a verifier wants to know if a statement $X$ is true and can ask queries to a prover $P$ which has some evidence $w$ that it wants to keep secret. For example, the statement might be “the number $y$ is a perfect square” and the evidence might be an $\alpha$ such that $y = \alpha^2 \mod n$. The prover doesn’t want to reveal $w = \alpha$, but instead should convince the verifier that such an $alpha$ exists. She gave a protocol for this before turning to a more complicated statement like proving that a graph has a Hamiltonian cycle. She then talked about using commitment schemes, at which point I sort of lost the thread of things since I’m not as familiar with these cryptography constructions. I probably should have asked more questions, so it was my loss.

Daniel Wichs discussed two problems he called “multi-key” and “spooky” fully-homomorphic encryption (FHE). The idea in multi-key FHE is that you have $N$ users who encrypt values $\{ x_i : i \in [N] \}$ with their public key and upload them to a server. Someone with access to the server wants to be able to decode only a function $f(x_1, x_2, \ldots, x_N)$ using the combined private keys of all the users. In “spooky” FHE, you have $N$ decoders, each with one of the private keys, but they want to decode values $\{y_i : i \in [N]\}$ which are functions of all of the encoded data. A simple example of this is when $y_1 \oplus y_2 = x_1 \wedge x_2$: that is, the XOR of the outputs is equal to the AND of the inputs. This generalizes to the XOR of multiple outputs being some function of the inputs, something he called additive function sharing. He then presented schemes for these two problems based on the “learning with errors” from Gentry, Sahai, and Waters, which I would apparently have to read to really understand the scheme. It’s some sort of linear algebra thing over $\mathbb{Z}_q$. Perhaps there are some connections to linear block codes or network coding to be exploited here.

# Salim El Rouayheb’s Shannon Channel: Pulkit Grover at 1300 EST

Salim El Rouayheb has started an exciting new initiative inspired by the TCS+ series. TCS+ is a seminar series on theoretical computer science (plus more) given over Google Hangout so that people across the world can attend the talk (and even ask questions). Nobody has to travel anywhere. Salim’s version is for information theory and he’s calling it Shannon’s Channel. If you’re interested in getting announcements you can sign up for the mailing list.

Salim told me about this at Allerton and I meant to plug it here on the blog earlier but then the semester plus excessive travel ate me. He just sent a reminder yesterday that the inimitable Pulkit Grover will be giving a seminar today (Monday) at 1 PM:

Error-correction and suppression in communication and computing: a tradeoff between information and energy dissipation

Abstract: Information naturally tends to dissipate. This dissipation can be slowed down, but this requires increased energy dissipation. Shannon’s capacity theorem can be interpreted as the first word in this information-energy dissipation tradeoff, but it barely scratches the surface. I will begin with a survey of recent results on minimal energy dissipation for reliable information communication. I will discuss how incorporating energy dissipated in transmitter/receiver circuitry as well as in transmission leads to radically different fundamental limits on information-energy interactions than those obtained by Shannon. I’ll also talk about practical applications in short distance wired and wireless communications.

These techniques can also be applied to obtain fundamental limits to information-energy dissipation for reliable computation using unreliable/noisy components (first considered in [von Neumann ’56]). Recent work on strong data-processing inequality points out the fundamental difficulty in noisy computing: information-dissipation across multiple computation steps. We ask the question: what is the minimum energy-dissipation needed to keep information intact (reliability constant) as the computation proceeds? I’ll describe our novel ENCODED strategy (ENcoded COmputation with DEcoders EmbeddeD) for linear computations on noisy substrates, that outperforms uncoded/repetition-based strategies and keeps error-probability bounded below a constant. The key insight is that for computing in noisy environments, repeated error-suppression (that dissipates energy) is essential to keep information from dissipating. Application to emerging devices and circuit design techniques will also be discussed.

Finally, I’ll talk about a high-density noninvasive biopotential sensing problem, which is closely related to the problem of compressing a Markov source distributedly. Here, energy constraints limit the number of sensors. I’ll discuss how a novel “hierarchical” architecture that contains error-accumulation turns out to have a substantially improved energy-information dissipation tradeoff than simply “compressing innovations” (a strategy known to be suboptimal from a work of Kim and Berger).

The Hangout link is here and the talk will be on YouTube afterwards.

Unfortunately, I have to teach during that time, otherwise I would totally be there, virtually.

# Mathematical Tools of Information-Theoretic Security Workshop: Days 2-3

I took sketchier notes as the workshop progressed, partly due to the ICASSP deadline, but also because jet lag started to hit me. The second day was a half day, which started with Zhenjie Zhang giving a tutorial on differential privacy from a databases/data mining perspective and my talk on more machine learning aspects. In between us was a talk by Ben Smyth on building automatic verification for security protocols. Basically you write the protocol as a program and then the ProVerif verifier will go and try to break your protocol. As an example, it can automatically find/generate a man-in-the-middle attack if one exists. I thought it was pretty neat, especially after having recently talked to someone about automatic proof systems. It’s based on something called the applied pi calculus, which I did not understand at all, but hey, I learned something new, which was great. The last two talks of the day were by Lalitha Sankar and Mari Kobayashi. Lalitha talked about mutual information based measures of privacy leakage in an interactive communication setting that is the information-theoretic analogue of communication complexity models in CS. Mari talked about the broadcast channel with state feedback. This is trying to find secure analogues of these opportunistic multicast settings where you need to also generate a secret key.

The last day was on quantum! I learned a lot and took few notes, unfortunately. Andreas Winter gave a tutorial on quantum (the slides for most talks are online and his are as well) and Ciara Morgan discussed the challenges in proving a strong converse for the the capacity of quantum channels. Damian Markham talked about secret sharing in quantum systems. Masahito Hayashi gave a very densely-packed talk surveying a large number of results based on secure randomness extraction and hash functions using Rényi information measures. I think privacy amplification is really interesting but I think I need a tutorial on it before I can really get the research results. The last non-overview talk I have notes on was by David Elkouss (apologies to the remaining speakers): this was a really interesting presentation on how to decide which of two channels is better from a quantum communication sense. The slides are a little engimatic, but the papers are online.

Shlomo Shamai made it to the last day of the workshop (the intersection with High Holidays was unfortunate) — he talked about the layered secrecy view of the broadcast channel: rather than thinking only of the secret message as carrying information, one can think of certain layers (c.f. superposition coding) as being secured based on the channel to the non-legitimate receiver. For example, in a degraded broadcast channel, the strong receiver’s message can sometimes be thought of as secret from the weak receiver. This leads to a raft of models and setups based on who wants to keep what secret from whom, shedding some light on standard superposition, rate splitting, binning, and embedding constructions. The talk was largely based on a paper in the current issues of the Proceedings of the IEEE.

All in all, this was a really great workshop, and the organizers were very generous in the organization.

# Mathematical Tools of Information-Theoretic Security Workshop: Day 1

It’s been a while since I have conference-blogged but I wanted to set aside a little time for it. Before going to Allerton I went to a lovely workshop in Paris on the Mathematical Tools of Information-Theoretic Security thanks to a very kind invitation from Vincent Tan and Matthieu Bloch. This was a 2.5 day workshop covering a rather wide variety of topics, which was good for me since I learned quite a bit. I gave a talk on differential privacy and machine learning with a little more of a push on the mathematical aspects that might be interesting from an information-theory perspective. Paris was appropriately lovely, and it was great to see familiar and new faces there. Now that I am at Rutgers I should note especially our three distinguished alumnae, Şennur Ulukuş, Aylin Yener, and Lalitha Sankar.

# ISIT 2015 : statistics and learning

The advantage of flying to Hong Kong from the US is that the jet lag was such that I was actually more or less awake in the mornings. I didn’t take such great notes during the plenaries, but they were rather enjoyable, and I hope that the video will be uploaded to the ITSOC website soon.

There were several talks on entropy estimation in various settings that I did not take great notes on, to wit:

• OPTIMAL ENTROPY ESTIMATION ON LARGE ALPHABETS VIA BEST POLYNOMIAL APPROXIMATION (Yihong Wu, Pengkun Yang, University Of Illinois, United States)
• DOES DIRICHLET PRIOR SMOOTHING SOLVE THE SHANNON ENTROPY ESTIMATION PROBLEM? (Yanjun Han, Tsinghua University, China; Jiantao Jiao, Tsachy Weissman, Stanford University, United States)
• ADAPTIVE ESTIMATION OF SHANNON ENTROPY (Yanjun Han, Tsinghua University, China; Jiantao Jiao, Tsachy Weissman, Stanford University, United States)

I would highly recommend taking a look for those who are interested in this problem. In particular, it looks like we’re getting towards more efficient entropy estimators in difficult settings (online, large alphabet), which is pretty exciting.

QUICKEST LINEAR SEARCH OVER CORRELATED SEQUENCES
Javad Heydari, Ali Tajer, Rensselaer Polytechnic Institute, United States
This talk was about hypothesis testing where the observer can control the samples being taken by traversing a graph. We have an $n$-node graph (c.f. a graphical model) representing the joint distribution on $n$ variables. The data generated is i.i.d. across time according to either $F_0$ or $F_1$. At each time you get to observe the data from only one node of the graph. You can either observe the same node as before, explore by observing a different node, or make a decision about whether the data from from $F_0$ or $F_1$. By adopting some costs for different actions you can form a dynamic programming solution for the search strategy but it’s pretty heavy computationally. It turns out the optimal rule for switching has a two-threshold structure and can be quite a bit different than independent observations when the correlations are structured appropriately.

MISMATCHED ESTIMATION IN LARGE LINEAR SYSTEMS
Yanting Ma, Dror Baron, North Carolina State University, United States; Ahmad Beirami, Duke University, United States
The mismatch studied in this paper is a mismatch in the prior distribution for a sparse observation problem $y = Ax + \sigma_z z$, where $x \sim P$ (say a Bernoulli-Gaussian prior). The question is what happens when we do estimation assuming a different prior $Q$. The main result of the paper is an analysis of the excess MSE using a decoupling principle. Since I don’t really know anything about the replica method (except the name “replica method”), I had a little bit of a hard time following the talk as a non-expert, but thankfully there were a number of pictures and examples to help me follow along.

SEARCHING FOR MULTIPLE TARGETS WITH MEASUREMENT DEPENDENT NOISE
Yonatan Kaspi, University of California, San Diego, United States; Ofer Shayevitz, Tel-Aviv University, Israel; Tara Javidi, University of California, San Diego, United States
This was another search paper, but this time we have, say, $K$ targets $W_1, W_2, \ldots, W_K$ uniformly distributed in the unit interval, and what we can do is query at each time $n$ a set $S_n \subseteq [0,1]$ and get a response $Y_n = X_n \oplus Z_n$ where $X_n = \mathbf{1}( \exists W_k \in S_n )$ and $Z_n \sim \mathrm{Bern}( \mu(S_n) + b )$ where $\mu$ is the Lebesgue measure. So basically you can query a set and you get a noisy indicator of whether you hit any targets, where the noise depends on the size of the set you query. At some point $\tau$ you stop and guess the target locations. You are $(\epsilon,\delta)$ successful if the probability that you are within $\delta$ of each target is less than $\epsilon$. The targeting rate is the limit of $\log(1/\delta) / \mathbb{E}[\tau]$ as $\epsilon,\delta \to 0$ (I’m being fast and loose here). Clearly there are some connections to group testing and communication with feedback, etc. They show there is a significant gap between the adaptive and nonadaptive rate here, so you can find more targets if you can adapt your queries on the fly. However, since rate is defined for a fixed number of targets, we could ask how the gap varies with $K$. They show it shrinks.

ON MODEL MISSPECIFICATION AND KL SEPARATION FOR GAUSSIAN GRAPHICAL MODELS
Varun Jog, University of California, Berkeley, United States; Po-Ling Loh, University of Pennsylvania, United States
The graphical model for jointly Gaussian variables has no edge between nodes $i$ and $j$ if the corresponding entry $(\Sigma^{-1})_{ij} = 0$ in the inverse covariance matrix. They show a relationship between the KL divergence of two distributions and their corresponding graphs. The divergence is lower bounded by a constant if they differ in a single edge — this indicates that estimating the edge structure is important when estimating the distribution.

CONVERSES FOR DISTRIBUTED ESTIMATION VIA STRONG DATA PROCESSING INEQUALITIES
Aolin Xu, Maxim Raginsky, University of Illinois at Urbana–Champaign, United States
Max gave a nice talk on the problem of minimizing an expected loss $\mathbb{E}[ \ell(W, \hat{W}) ]$ of a $d$-dimensional parameter $W$ which is observed noisily by separate encoders. Think of a CEO-style problem where there is a conditional distribution $P_{X|W}$ such that the observation at each node is a $d \times n$ matrix whose columns are i.i.d. and where the $j$-th row is i.i.d. according to $P_{X|W_j}$. Each sensor gets independent observations from the same model and can compress its observations to $b$ bits and sends it over independent channels to an estimator (so no MAC here). The main result is a lower bound on the expected loss as s function of the number of bits latex $b$, the mutual information between $W$ and the final estimate $\hat{W}$. The key is to use the strong data processing inequality to handle the mutual information — the constants that make up the ratio between the mutual informations is important. I’m sure Max will blog more about the result so I’ll leave a full explanation to him (see what I did there?)

More on Shannon theory etc. later!

# 2015 North American School of Information Theory

The 2015 ​North American ​School of Information Theory ​(NASIT) will be held on August 10-13, 2015, at the University of California, San Diego in La Jolla. If you or your colleagues have students who might be interested in this event, we would be grateful if you could forward this email to them and encourage their participation. The application deadline is ​Sunday, June 7. As in the past schools, we again have a great set of lecturers this year​​:

We are pleased to announce that ​Paul Siegel will be the​​ Padovani Lecturer of the IEEE Information Theory Society​​ and will give his lecture at the School. The Padovani Lecture is sponsored by a generous gift of Roberto Padovani.

# Signal boost: Postdoc positions at Tel Aviv University

Two postdoctoral research positions are now available in the Department of Electrical Engineering – Systems at Tel Aviv University, Israel, in the fields of information theory and interactive communications. Starting immediately for up to two years. Funded by the European Research Council (ERC).

We offer two postdoctoral fellowships for researchers in the broad area of information theory, with special emphasis on interactive communications. Specific topics of interest include single-user and multiuser communications with noisy feedback, iterative-refinement coding for two-way channels, interactive coding and its relations to dynamical systems and stochastic control, resource-limited interactive communications, distributed function computation, and combinatorial aspects of multiuser interactive communications. The research will be conducted in close collaboration with Dr. Ofer Shayevitz and his group, and is funded by a grant from the European Research Council (ERC).

The positions are available immediately and for a period of up to two years. Applicants should hold a PhD in either electrical engineering, computer science, or mathematics, and are expected to have a strong background in information theory or closely related fields. Remuneration is highly competitive and commensurate with skills and track record. To apply, please send your CV along with a short statement of research interests to Dr. Ofer Shayevitz at ofersha@eng.tau.ac.il.

# CFP: 2015 Information Theory Workshop (ITW), Jeju Island

I am on the TPC for ITW 2015 in Jeju Island, South Korea.

The 2015 IEEE Information Theory Workshop will take place in Jeju Island, Korea, from October 11 to October 15, 2015. Jeju Island is the largest island in Korea and is located in the Pacific Ocean just off the south-western tip of the Korean peninsula. Jeju Island is a volcanic island with a mountainous terrain, a dramatic rugged coastline and spectacular watershed courses. The Island has a unique culture as well as natural beauty. It is a living folk village, with approximately 540,000 people. As a result of its isolated location and romantic tropical image, Jeju Island has become a favorite retreat with honeymooners and tourists. The tour programs of the conference will also provide participants with the opportunity to feel and enjoy some of the island’s fascinating attractions.

Special topics of emphasis include:

• Big data
• Coding theory
• Communication theory
• Computational biology
• Interactive communication
• Machine learning
• Network information theory
• Privacy and security
• Signal processing

# ISIT Deadline Extended to Monday

Apparently not everyone got this email, so here it is. I promise this blog will not become PSA-central.

Dear ISIT-2015-Submission Reviewers:

In an effort to ensure that each paper has an appropriate number of reviews, the deadline for the submission of all reviews has been extended to March 2nd. If you have not already done so, please submit your review by March 2nd as we are working to a very tight deadline.

(a) all submissions are eligible to be considered for presentation in a semi-plenary session — Please ensure that your review provides an answer to Question 11
(b) in the case of a submission that is eligible for the 2015 IEEE Jack Keil Wolf ISIT Student Paper Award, the evaluation form contains a box at the top containing the text:
Notice: This paper is to be considered for the 2015 IEEE Jack Keil Wolf ISIT Student Paper Award, even if the manuscript itself does not contain a statement to that effect.
– Please ensure that your review provides an answer to Question 12 if this is the case.

Thanks very much for helping out with the review process for ISIT, your inputs are of critical importance in ensuring that the high standards of an ISIT conference are maintained. We know that reviewing a paper takes much effort and we are grateful for all the time you have put in!

With regards,

Pierre, Suhas and Vijay
(TPC Co-Chairs, ISIT 2015)

# ITA 2015: quick takes

Better late than never, I suppose. A few weeks ago I escaped the cold of New Jersey to my old haunts of San Diego. Although La Jolla was always a bit fancy for my taste, it’s hard to beat a conference which boasts views like this:

A view from the sessions at ITA 2015

I’ll just recap a few of the talks that I remember from my notes — I didn’t really take notes during the plenaries so I don’t have much to say about them. Mostly this was due to laziness, but finding the time to blog has been challenging in this last year, so I think I have to pick my battles. Here’s a smattering consisting of

$\{ \mathrm{talks\ attended} \} \cap \{ \mathrm{talks\ with\ understandable\ notes} \}$

(Information theory)
Emina Soljanin talked about designing codes that are good for fast access to the data in distributed storage. Initial work focused on how to repair codes under disk failures. She looked at how easy it is to retrieve the information afterwords to guarantee some QoS for the storage system. Adam Kalai talked about designing compression schemes that work for an “audience” of decoders. The decoders have different priors on the set of elements/messages so the idea is to design an encoder that works for this ensemble of decoders. I kind of missed the first part of the talk so I wasn’t quite sure how this relates to classical work in mismatched decoding as done in the information theory world. Gireeja Ranade gave a great talk about defining notions of capacity/rate need to control a system which as multiplicative uncertainty. That is, $x[n+1] = x[n] + B[n] u[n]$ where $B[n]$ has the uncertainty. She gave a couple of different notions of capacity, relating to the ratio $| x[n]/x[0] |$ — either the expected value of the square or the log, appropriately normalized. She used a “deterministic model” to give an explanation of how control in this setting is kind of like controlling the number of significant bits in the state: uncertainty increases this and you need a certain “amount” of control to cancel that growth.

(Learning and statistics)
I learned about active regression approaches from Sivan Sabato that provably work better than passive learning. The idea there is do to use a partition of the X space and then do piecewise constant approximations to a weight function that they use in a rejection sampler. The rejection sampler (which I thought of as sort of doing importance sampling to make sure they cover the space) helps limit the number of labels requested by the algorithm. Somehow I had never met Raj Rao Nadakuditi until now, and I wish I had gotten a chance to talk to him further. He gave a nice talk on robust PCA, and in particular how outliers “break” regular PCA. He proposed a combination of shrinkage and truncation to help make PCA a bit more stable/robust. Laura Balzano talked about “estimating subspace projections from incomplete data.” She proposed an iterative algorithm for doing estimation on the Grassmann manifold that can do subspace tracking. Constantine Caramanis talked about a convex formulation for mixed regression that gives a guaranteed solution, along with minimax sample complexity bounds showing that it is basically optimal. Yingbin Liang talked about testing approaches for understanding if there is an “anomalous structure” in a sequence of data. Basically for a sequence $Y_1, Y_2, \ldots, Y_n$, the null hypothesis is that they are all i.i.d. $\sim p$ and the (composite) alternative is that there an interval of indices which are $\sim q$ instead. She proposed a RKHS-based discrepancy measure and a threshold test on this measure. Pradeep Ravikumar talked about a “simple” estimator that was a “fix” for ordinary least squares with some soft thresholding. He showed consistency for linear regression in several senses, competitive with LASSO in some settings. Pretty neat, all said, although he also claimed that least squares was “something you all know from high school” — I went to a pretty good high school, and I don’t think we did least squares! Sanmi Koyejo talked about a Bayesian devision theory approach to variable selection that involved minimizing some KL-divergence. Unfortunately, the resulting optimization ended up being NP-hard (for reasons I can’t remember) and so they use a greedy algorithm that seems to work pretty well.

(Privacy)
Cynthia Dwork gave a tutorial on differential privacy with an emphasis on the recent work involving false discovery rate. In addition to her plenary there were several talks on differential privacy and other privacy measures. Kunal Talwar talked about their improved analysis of the SuLQ method for differentially private PCA. Unfortunately there were two privacy sessions in parallel so I hopped over to see John Duchi talk about definitions of privacy and how definitions based on testing are equivalent to differential privacy. The testing framework makes it easier to prove minimax bounds, though, so it may be a more useful view at times. Nadia Fawaz talked about privacy for time-series data such as smart meter data. She defined different types of attacks in this setting and showed that they correspond to mutual information or directed mutual information, as well as empirical results on a real data set. Raef Bassily studied a estimation problem in the streaming setting where you want to get a histogram of the most frequent items in the stream. They reduce the problem to one of finding a “unique heavy hitter” and develop a protocol that looks sort of like a code for the MAC: they encode bits into a real vector, had noise, and then add those up over the reals. It’s accepted to STOC 2015 and he said the preprint will be up soon.