# What’s new is old in ethics and conduct

(h/t to Stark Draper, Elza Erkip, Allie Fletcher, Tara Javidi, and Tsachy Weissman for sources)

The IEEE Information Theory Society Board of Governors voted to approve the following statement to be included on official society events and on the website:

IEEE members are committed to the highest standards of integrity, responsible behavior, and ethical and professional conduct. The IEEE Information Theory Society reaffirms its commitment to an environment free of discrimination and harassment as stated in the IEEE Code of Conduct, IEEE Code of Ethics, and IEEE Nondiscrimination Policy. In particular, as stated in the IEEE Code of Ethics and Code of Conduct, members of the society will not engage in harassment of any kind, including sexual harassment, or bullying behavior, nor discriminate against any person because of characteristics protected by law. In addition, society members will not retaliate against any IEEE member, employee or other person who reports an act of misconduct, or who reports any violation of the IEEE Code of Ethics or Code of Conduct.

I guess the lawyers had to have a go at it, but this is essentially repeating that the IEEE already had rules and so here, we’re reminding you about the rules. This statement is saying “the new rules are the old rules.” We probably need more explicit new rules, however. In particular, many conferences have more detailed codes of conduct (NeurohackWeek, RSA,
Usenix, APEC) that provide more detail about how the principles espoused in the text above are implemented. Often, these conferences have formal reporting procedures/policies and sanctions for violations: many IEEE conferences do not. The NSF is now requiring reporting on PIs who are “found to have committed sexual harassment” so incidents at conferences where the traveler is presenting NSF-sponsored should also be reported, it seems.

While the ACM’s rules suggest making reporting procedures, perhaps a template (borrowed from another academic community?) could just become part of the standard operating procedure for running an IEEE conference. Just have a member of the organizing committee in charge, similar to having a local arrangements chair, publicity chair, etc. However, given the power dynamics of academic communities, perhaps people would feel more comfortable reporting incidents to someone outside the community.

Relatedly, The Society also approved creating an Ad Hoc Committee on Diversity and Inclusion (I’m not on it) who have already done a ton of work on this and will find other ways to make the ITSOC (even) more open and welcoming.

# Hello from the IPAM Workshop on Privacy for Biomedical Data

I just arrived in LA for the IPAM Workshop on Algorithmic Challenges in Protecting Privacy for Biomedical Data. I co-organized this workshop with Cynthia Dwork, James Zou, and Sriram Sankararaman and it is (conveniently) before the semester starts and (inconveniently) overlapping with the MIT Mystery Hunt. The workshop has a really diverse set of speakers so to get everyone on the same page and anchor the discussion, we have 5 tutorial speakers and a few sessions or shorter talks. The hope is that these tutorials (which are on the first two days of the workshop) will give people some “common language” to discuss research problems.

The other big change we made to the standard workshop schedule was to put in time for “breakout groups” to have smaller discussions focused on identifying the key fundamental problems that need to be addressed when thinking about privacy and biomedical data. Because of the diversity of viewpoints among participants, it seems a tall order to generate new research collaborations out of attending talks and going to lunch. But if we can, as a group, identify what the mathematical problems are (and maybe even why they are hard), this can help identify the areas of common interest.

I think of these as falling into a few different categories.

• Questions about demarcation. Can we formalize (mathematically) the privacy objective in different types of data sets/computations? Can we use these to categorize different types of problems?
• Metrics. How do we formulate the privacy-utility tradeoffs for different problems? What is the right measure of performance? What (if anything) do we lose in guaranteeing privacy?
• Possibility/impossibility. Algorithms which can guarantee privacy and utility are great, but on the flip side we should try to identify when privacy might be impossible to guarantee. This would have implications for higher-level questions about system architectures and policy.
• Domain-specific questions. In some cases all of the setup is established: we want to compute function F on dataset D under differential privacy and the question is to find algorithms with optimal utility for fixed privacy loss or vice versa. Still, identifying those questions and writing them down would be a great outcome.

In addition to all of this, there is a student poster session, a welcome reception, and lunches. It’s going to be a packed 3 days, and although I will miss the very end of it, I am excited to learn a lot from the participants.

# Some thoughts on paper awards at conferences

We (really Mohsen and Zahra) had a paper nominated for a student paper award at CAMSAP last year, but since both student authors are from Iran, their single-entry student visas prevented them from going to the conference. The award terms require that the student author present the work (in a poster session) and the conference organizers were kind enough to allow Mohsen to present his poster via Skype. It’s hardly an ideal communication channel, given how loud poster sessions are. Although the award went to a different paper, the experience brought up two questions that are not new but don’t get a lot of discussion.

How should paper awards deal with visa issues? This is not an issue specific to students from Iran, although the US State Department’s visa issuance for Iranian students is stupidly restrictive. Students from Iran are essentially precluded from attending any non-US conference unless they want to roll the dice again and wait for another visa at home. Other countries may also deny visas to students for various reasons. Requiring students to be present at the conference is discriminatory, since the award should be based on the work. Disqualifying a student for an award because of bullshit political/bureaucratic nonsense that is totally out of their control just reinforces that bullshit.

Why are best papers judged by their presentation? I have never been a judge for a paper award and I am sure that judges try to be as fair as they can. However, the award is for the paper and not its performance. I agree that scholarly communication through oral presentation is a valuable skill, but if the award is going to be determined by who gives the best show at the conference, they should retitle these to “best student paper and presentation award” or something like that. Maybe it should instead be based on video presentations to allow remote participation. If you are going to call it a paper award, then it should based on the written work.

I don’t want this to seem like a case of sour grapes. Not all student paper awards work this way, but it seems to be the trend in IEEE-ish venues. The visa issue has hurt a lot of researchers I know; they miss out on opportunities to get their name/face known, chances to meet and network with people, and the experience of being exposed to a ton of ideas in a short amount of time. Back when I had time to do conference blogging, it was a way for me to process the wide array of new things that I saw. For newer researchers (i.e. students) this is really important. Making paper awards based on presentations hits these students doubly: they can neither attend the conference nor receive recognition for their work.

# IPAM Workshop on Algorithmic Challenges in Protecting Privacy for Biomedical Data

IPAM is hosting a workshop on Algorithmic Challenges in Protecting Privacy for Biomedical Data” which will be held at IPAM from January 10-12, 2018.

The workshop will be attended by many junior as well as senior researchers with diverse backgrounds. We want to to encourage students or postdoctoral scholars who might be interested, to apply and/or register for this workshop.

I think it will be quite interesting and has the potential to spark a lot of interesting conversations around what we can and cannot do about privacy for medical data in general and genomic data in specific.

# DIMACS Workshop on Distributed Optimization, Information Processing, and Learning

My colleague Waheed Bajwa, Alejandro Ribeiro, and Alekh Agarwal are organizing a Workshop on Distributed Optimization, Information Processing, and Learning from August 21 to August 23, 2017 at Rutgers DIMACS. The purpose of this workshop is to bring together researchers from the fields of machine learning, signal processing, and optimization for cross-pollination of ideas related to the problems of distributed optimization, information processing, and learning. All in all, we are expecting to have 20 to 26 invited talks from leading researchers working in these areas as well as around 20 contributed posters in the workshop.

Registration is open from now until August 14 — hope to see some of you there!

# Mathematical Tools of Information-Theoretic Security Workshop: Day 1

It’s been a while since I have conference-blogged but I wanted to set aside a little time for it. Before going to Allerton I went to a lovely workshop in Paris on the Mathematical Tools of Information-Theoretic Security thanks to a very kind invitation from Vincent Tan and Matthieu Bloch. This was a 2.5 day workshop covering a rather wide variety of topics, which was good for me since I learned quite a bit. I gave a talk on differential privacy and machine learning with a little more of a push on the mathematical aspects that might be interesting from an information-theory perspective. Paris was appropriately lovely, and it was great to see familiar and new faces there. Now that I am at Rutgers I should note especially our three distinguished alumnae, Şennur Ulukuş, Aylin Yener, and Lalitha Sankar.

# ISIT 2015 : statistics and learning

The advantage of flying to Hong Kong from the US is that the jet lag was such that I was actually more or less awake in the mornings. I didn’t take such great notes during the plenaries, but they were rather enjoyable, and I hope that the video will be uploaded to the ITSOC website soon.

There were several talks on entropy estimation in various settings that I did not take great notes on, to wit:

• OPTIMAL ENTROPY ESTIMATION ON LARGE ALPHABETS VIA BEST POLYNOMIAL APPROXIMATION (Yihong Wu, Pengkun Yang, University Of Illinois, United States)
• DOES DIRICHLET PRIOR SMOOTHING SOLVE THE SHANNON ENTROPY ESTIMATION PROBLEM? (Yanjun Han, Tsinghua University, China; Jiantao Jiao, Tsachy Weissman, Stanford University, United States)
• ADAPTIVE ESTIMATION OF SHANNON ENTROPY (Yanjun Han, Tsinghua University, China; Jiantao Jiao, Tsachy Weissman, Stanford University, United States)

I would highly recommend taking a look for those who are interested in this problem. In particular, it looks like we’re getting towards more efficient entropy estimators in difficult settings (online, large alphabet), which is pretty exciting.

QUICKEST LINEAR SEARCH OVER CORRELATED SEQUENCES
Javad Heydari, Ali Tajer, Rensselaer Polytechnic Institute, United States
This talk was about hypothesis testing where the observer can control the samples being taken by traversing a graph. We have an $n$-node graph (c.f. a graphical model) representing the joint distribution on $n$ variables. The data generated is i.i.d. across time according to either $F_0$ or $F_1$. At each time you get to observe the data from only one node of the graph. You can either observe the same node as before, explore by observing a different node, or make a decision about whether the data from from $F_0$ or $F_1$. By adopting some costs for different actions you can form a dynamic programming solution for the search strategy but it’s pretty heavy computationally. It turns out the optimal rule for switching has a two-threshold structure and can be quite a bit different than independent observations when the correlations are structured appropriately.

MISMATCHED ESTIMATION IN LARGE LINEAR SYSTEMS
Yanting Ma, Dror Baron, North Carolina State University, United States; Ahmad Beirami, Duke University, United States
The mismatch studied in this paper is a mismatch in the prior distribution for a sparse observation problem $y = Ax + \sigma_z z$, where $x \sim P$ (say a Bernoulli-Gaussian prior). The question is what happens when we do estimation assuming a different prior $Q$. The main result of the paper is an analysis of the excess MSE using a decoupling principle. Since I don’t really know anything about the replica method (except the name “replica method”), I had a little bit of a hard time following the talk as a non-expert, but thankfully there were a number of pictures and examples to help me follow along.

SEARCHING FOR MULTIPLE TARGETS WITH MEASUREMENT DEPENDENT NOISE
Yonatan Kaspi, University of California, San Diego, United States; Ofer Shayevitz, Tel-Aviv University, Israel; Tara Javidi, University of California, San Diego, United States
This was another search paper, but this time we have, say, $K$ targets $W_1, W_2, \ldots, W_K$ uniformly distributed in the unit interval, and what we can do is query at each time $n$ a set $S_n \subseteq [0,1]$ and get a response $Y_n = X_n \oplus Z_n$ where $X_n = \mathbf{1}( \exists W_k \in S_n )$ and $Z_n \sim \mathrm{Bern}( \mu(S_n) + b )$ where $\mu$ is the Lebesgue measure. So basically you can query a set and you get a noisy indicator of whether you hit any targets, where the noise depends on the size of the set you query. At some point $\tau$ you stop and guess the target locations. You are $(\epsilon,\delta)$ successful if the probability that you are within $\delta$ of each target is less than $\epsilon$. The targeting rate is the limit of $\log(1/\delta) / \mathbb{E}[\tau]$ as $\epsilon,\delta \to 0$ (I’m being fast and loose here). Clearly there are some connections to group testing and communication with feedback, etc. They show there is a significant gap between the adaptive and nonadaptive rate here, so you can find more targets if you can adapt your queries on the fly. However, since rate is defined for a fixed number of targets, we could ask how the gap varies with $K$. They show it shrinks.

ON MODEL MISSPECIFICATION AND KL SEPARATION FOR GAUSSIAN GRAPHICAL MODELS
Varun Jog, University of California, Berkeley, United States; Po-Ling Loh, University of Pennsylvania, United States
The graphical model for jointly Gaussian variables has no edge between nodes $i$ and $j$ if the corresponding entry $(\Sigma^{-1})_{ij} = 0$ in the inverse covariance matrix. They show a relationship between the KL divergence of two distributions and their corresponding graphs. The divergence is lower bounded by a constant if they differ in a single edge — this indicates that estimating the edge structure is important when estimating the distribution.

CONVERSES FOR DISTRIBUTED ESTIMATION VIA STRONG DATA PROCESSING INEQUALITIES
Aolin Xu, Maxim Raginsky, University of Illinois at Urbana–Champaign, United States
Max gave a nice talk on the problem of minimizing an expected loss $\mathbb{E}[ \ell(W, \hat{W}) ]$ of a $d$-dimensional parameter $W$ which is observed noisily by separate encoders. Think of a CEO-style problem where there is a conditional distribution $P_{X|W}$ such that the observation at each node is a $d \times n$ matrix whose columns are i.i.d. and where the $j$-th row is i.i.d. according to $P_{X|W_j}$. Each sensor gets independent observations from the same model and can compress its observations to $b$ bits and sends it over independent channels to an estimator (so no MAC here). The main result is a lower bound on the expected loss as s function of the number of bits latex $b$, the mutual information between $W$ and the final estimate $\hat{W}$. The key is to use the strong data processing inequality to handle the mutual information — the constants that make up the ratio between the mutual informations is important. I’m sure Max will blog more about the result so I’ll leave a full explanation to him (see what I did there?)

More on Shannon theory etc. later!