LabTV Profiles Are Up!

And now, a little pre-ITA self-promotion. As I wrote earlier, LabTV interviewed me and a subset of the students in the lab last semester (it was opt-in). This opportunity came out of my small part in the a large-scale collaboration organized by Mind Research Network (PI: Vince Calhoun) on trying to implement distributed and differentially private algorithms in a system to enable collaborative neuroscience research. Our lab profiles are now up! They interviewed me, graduate students Hafiz Imtiaz, Sijie Xiong, and Liyang Xie, and an undergraduate student, Kevin Sun. In watching I found that I learned a few new things about my students…

Signal boost: IBM Social Good Fellowship for data science

This announcement came via Kush Varshney. IBM is launching a new fellowship program. This came out of his work on DataKind and Saška Mojsilović’s work on Ebola. It’s open to students and postdocs!

I am pleased to let you know that Saška Mojsilović and I are launching a new fellowship program at IBM Research related to data science for social good. We are offering both 3-month summer fellowships for PhD students and full-year fellowships for postdocs. The fellowship webpage and link to apply may be found here.

Fellows will come work with research staff members at our Yorktown Heights laboratory to complete projects in partnership with NGOs, social enterprises, government agencies, or other mission-driven organizations that have large social impact. We are currently in the process of scoping projects across various areas, such as health, sustainability, poverty, hunger, equality, and disaster management. The program is intended to allow students to develop their technical skills and produce publishable work while making a positive impact on the world.

I request that you spread the word to students in your respective departments and the broader community.

Call for Papers: T-SIPN Special Issue on Inference and Learning Over Networks

IEEE Signal Processing Society
IEEE Transactions on Signal and Information Processing over Networks
Special Issue on Inference and Learning Over Networks

Networks are everywhere. They surround us at different levels and scales, whether we are dealing with communications networks, power grids, biological colonies, social networks, sensor networks, or distributed Big Data depositories. Therefore, it is not hard to appreciate the ongoing and steady progression of network science, a prolific research field spreading across many theoretical as well as applicative domains. Regardless of the particular context, the very essence of a network resides in the interaction among its individual constituents, and Nature itself offers beautiful paradigms thereof. Many biological networks and animal groups owe their sophistication to fairly structured patterns of cooperation, which are vital to their successful operation. While each individual agent is not capable of sophisticated behavior on its own, the combined interplay among simpler units and the distributed processing of dispersed pieces of information, enable the agents to solve complex tasks and enhance dramatically their performance. Self-organization, cooperation and adaptation emerge as the essential, combined attributes of a network tasked with distributed information processing, optimization, and inference. Such a network is conveniently described as an ensemble of spatially dispersed (possibly moving) agents, linked together through a (possibly time – varying) connection topology. The agents are allowed to interact locally and to perform in-network processing, in order to accomplish the assigned inferential task. Correspondingly, several problems such as, e.g., network intrusion, community detection, and disease outbreak inference, can be conveniently described by signals on graphs, where the graph typically accounts for the topology of the underlying space and we obtain multivariate observations associated with nodes/edges of the graph. The goal in these problems is to identify/infer/learn patterns of interest, including anomalies, outliers, and existence of latent communities. Unveiling the fundamental principles that govern distributed inference and learning over networks has been the common scope across a variety of disciplines, such as signal processing, machine learning, optimization, control, statistics, physics, economics, biology, computer, and social sciences. In the realm of signal processing, many new challenges have emerged, which stimulate research efforts toward delivering the theories and algorithms necessary to (a) designing networks with sophisticated inferential and learning abilities; (b) promoting truly distributed implementations, endowed with real-time adaptation abilities, needed to face the dynamical scenarios wherein real-world networks operate; and (c) discovering and disclosing significant relationships possibly hidden in the data collected from across networked systems and entities. This call for papers therefore encourages submissions from a broad range of experts that study such fundamental questions, including but not limited to:

  • Adaptation and learning over networks.
  • Consensus strategies; diffusion strategies.
  • Distributed detection, estimation and filtering over networks.
  • Distributed dictionary learning.
  • Distributed game-theoretic learning.
  • Distributed machine learning; online learning.
  • Distributed optimization; stochastic approximation.
  • Distributed proximal techniques, sub-gradient techniques.
  • Learning over graphs; network tomography.
  • Multi-agent coordination and processing over networks.
  • Signal processing for biological, economic, and social networks.
  • Signal processing over graphs.

Prospective authors should visit http://www.signalprocessingsociety.org/publications/periodicals/tsipn/ for information on paper submission. Manuscripts should be submitted via Manuscript Central at http://mc.manuscriptcentral.com/tsipn-ieee.

Important Dates:

  • Manuscript submission: February 1, 2016
  • First review completed: April 1, 2016
  • Revised manuscript due: May 15, 2016
  • Second review completed: July 15, 2016
  • Final manuscript due: September 15, 2016
  • Publication: December 1, 2016

Guest Editors:

 

Randomized response, differential privacy, and the elusive biased coin

In giving talks to broader audiences about differential privacy, I’ve learned quickly (thanks to watching talks by other experts) that discussing randomized response first is an easy way to explain the kind of “plausible deniability” guarantee that differentially private algorithms give to individuals. In randomized response, the setup is that of local privacy: the simplest model is that a population of n individuals with data x_1, x_2, \ldots, x_n \in \{0,1\} representing some sensitive quantity are to be surveyed by an untrusted statistician. Concretely, suppose that the individual bits represent whether the person is a drug user or not. The statistician/surveyor wants to know the fraction p = \frac{1}{n} \sum x_i of users in the population. However, individuals don’t trust the surveyor. What to do?

The surveyor can give the individuals a biased coin that comes up heads with probability q < 1/2. The individual flips the coin in private. If it comes up heads, they lie and report y_i = 1 - x_i. If it comes up tails, they tell the truth y_i = x_i. The surveyor doesn’t see the outcome of the coin, but can compute the average of the \{y_i\}. What is the expected value of this average?

\mathbb{E}\left[ \frac{1}{n} \sum_{i=1}^{n} y_i \right] = \frac{1}{n} \sum_{i=1}^{n} (q (1 - x_i) + (1 -q) x_i) = q + (1 - 2q) p.

So we can invert this to solve for p: if we have a reported average \bar{y} = \frac{1}{n} \sum y_i then estimate p by

\hat{p} = \frac{\bar{y} - q}{ 1 - 2 q }.

What does this have to do with differential privacy? Each individual got to potentially lie about their drug habits. So if we look at the hypothesis test for a surveyor trying to figure out if someone is a user from their response, we get the likelihood ratio

\frac{ \mathbb{P}( y_i = 1 | x_i = 1 ) }{ \mathbb{P}( y_i = 1 | x_i = 0 ) } = \frac{1 - q}{q}

If we set \epsilon = \log \frac{1 - q}{q}, we can see that the protocol guarantees differential privacy. This gives a possibly friendlier interpretation of \epsilon in terms of the “lying probability” q. We can plot this:

Epsilon versus lying probability

Epsilon versus lying probability

This is a bit pessimistic — it says that to guarantee reasonable “lying probability” we need \epsilon \ll 1, but in practice this turns out to be quite difficult. Why so pessimistic? The differential privacy thread model is pretty pessimistic — it’s your plausible deniability given that everyone else in the data set has revealed their data to the surveyor “in the clear.” This is the fundamental tension in thinking about the practical implications of differential privacy — we don’t want to make conditional guarantees (“as long as everyone else is secret too”) but the price of an unconditional guarantee can be high in the worst case.

So how does randomized response work in practice? It seems we would need a biased coin. Maybe one can custom order them from Alibaba? Turns out, the answer is not really. Gelman and Nolan have an article about getting students to try and evaluate the bias of a coin — the physics of flipping would seem to dictate that coins are basically fair. You can load dice, but not coins. I recommend reading through the article — it sounds like a fun activity, even for graduate students. Maybe I’ll try it in my Detection and Estimation course next semester.

Despite the widespread prevalence of “flipping a biased coin” as a construction in probability, randomized algorithms, and information theory, a surprisingly large number of people I have met are completely unaware of the unicorn-like nature of biased coins in the real world. I guess we really are in an ivory tower, eh?

Rutgers ECE GAANN Fellowships for Graduate Students

In case there are any potential grad school applicants to Rutgers who read this blog, we recently were awarded a GAAAN award to help fund some graduate fellowships for US citizens or permanent residents interested in bioelectrical engineering (somewhat broadly construed). Application review will start soon, so if you’re interested in this opportunity, read on.

The Rutgers ECE Department is proud to announce the Graduate Assistance in Areas of National Need (GAANN) Fellowship. The GAANN Fellowship program provides need-based financial support to Ph.D. students pursuing a degree in areas related to bioelectrical engineering at the Department of Electrical and Computer Engineering, Rutgers University. Each GAANN Fellow receives a stipend to cover the Fellow’s financial need. A typical stipend is $34,000 per year for up to 5 years, subject to satisfactory performance. ECE is pleased to announce 5 GAANN Fellowships. Minority students, women and other underrepresented groups are particularly encouraged to apply.

Applicants must:

  • Be U.S. citizens or permanent residents
  • Have a GPA of 3.5/4.0 or higher
  • Plan to pursue a Ph.D. degree in Electrical and Computer Engineering at Rutgers University
  • Have Financial Need
  • Demonstrate excellent academic performance
  • Submit an application and supporting documents

Deadline: To apply, please email the application and supporting documents to Arletta Hoscilowicz AS SOON AS POSSIBLE.

Effective early anti-plagiarism interventions for (mostly international) Masters students

My department at Rutgers, like many engineering departments across the country, has a somewhat sizable Master’s program, mostly because it “makes money” for the department [1]. The vast majority of the students in the program are international students, many of whom have English as a second or third language, and whose undergraduate instruction was not necessarily in English. As a consequence, they face considerable challenges in writing in general, and academic writing in particular. Faced with the prospect of writing an introduction to a project report and wanting to sound impressive or sophisticated, many seem tempted into copying sentences or even paragraphs from references without citation. This is, of course, plagiarism, and what distresses me and many colleagues is that the students often don’t understand what they did wrong or how to write appropriately in an academic setting. Is this because most non-American universities don’t teach about referencing, citation, and plagiarism? I hesitate to lay the blame elsewhere — it’s hard (initially) to write formally in a foreign language. However, the students I have met say things like “oh, I thought you didn’t need to reference tutorials,” so there is definitely an element of ill-preparedness. Adding to this of course is that students are stressed, find it expedient, and hope that nobody will notice.

Most undergrad programs in the US have some sort of composition requirement, and at least at my high school, we learned basic MLA citation rules as part of English senior year. However, without assuming this background/pre-req, what can we do? My colleague Waheed Bajwa was asking if there are additional resources out there to help students learn about plagiarism before they turn in their assignments. Of course we put links to resources in syllabi, but as we all know, students tend to not read the syllabus, especially what seem like administrative and legalistic things. Academic misconduct is serious and can result in expulsion, but unless you’re a vindictive type, the goal shouldn’t be to have a “one strike and you’re out” policy. I’ve heard someone else suggest that students sign a contract at the beginning of the semester so they are forced to read it. Then, if they are given an automatic F for the class you can point to the policy. That also seems like dodging the underlying issue, pedagogically speaking.

Another strategy I have tried is to have students turn in a draft of a final project, which I then run through TurnItIn [2] or I manually search for copied sentences. I then issue a stern/threatening warning with links to information about plagiarism. Waheed does the same thing, but this is pretty time-intensive and also means that some students get the attention and some don’t. Students who are here for a Masters lack some incentives to do the right thing the first time — if this is the last semester of their program and suddenly this whole plagiarism thing rears its head in their last class, they may be tempted to just fix the issues raised in the draft and move on without really internalizing the ethics. I’m not saying students are unethical. However, part of engineering/academics, especially at the graduate level, is teaching the ethics around citation and attribution. I pointed out to one student that copying from sources without attribution is stealing and that kind of behavior could get them fired at a company, especially if they violate a law. They seemed surprised by this metaphor. That’s just an anecdote, but I find it telling.

The major issues I see are that:

  • Undergrad-focused models for plagiarism education do not seem to address the issue of ESL-writers or the particulars of scientific/engineering writing.
  • Educating short-term graduate students (M.S.) about plagiarism in classes alone results in uneven learning and outcomes.

What we (and I think most programs) really need is an earlier and better educational intervention that helps address the particulars of these programs. I was Googling around for possible solutions and came across a paper by Gunnarsson, Kulesza, and Pettersson on “Teaching International Students How to Avoid Plagiarism: Librarians and Faculty in Collaboration”:

This paper presents how a plagiarism component has been integrated in a Research Methodology course for Engineering Master students at Blekinge Institute of Technology, Sweden. The plagiarism issue was approached from an educational perspective, rather than a punitive. The course director and librarians developed this part of the course in close collaboration. One part of the course is dedicated to how to cite, paraphrase and reference, while another part stresses the legal and ethical aspects of research. Currently, the majority of the students are international, which means there are intercultural and language aspects to consider. In order to evaluate our approach to teaching about plagiarism, we conducted a survey. The results of the survey indicate a need for education on how to cite and reference properly in order to avoid plagiarism, a result which is also supported by students’ assignment results. Some suggestions are given for future development of the course.

This seems to be exactly the kind of thing we need. The premises of the paper are exactly as we experience in the US: reasons for plagiarism are complex, and most students plagiarize “unintentionally” in the sense that the balance between ethics and expediency is fraught. One issue the authors raise is that “views of the concept of plagiarism… may vary greatly among students from one country” so we must be “cautious about making assumptions based on students’ cultural background.” When I’ve talked to professional colleagues (in my field and in other technical fields) I often hear statements like “students from country X don’t understand plagiarism” — we have to be careful about generalizations!

The key aspect of the above intervention is partnering with librarians, who are the experts in teaching these concepts, as part of a research methods course. Many humanities programs offer field-specific research methods courses. These provide important training for academic work. We can do the same in engineering, but it would require more effort and resources. For those readers interested in the ESL issues, there are a lot of studies in the references that describe the multifaceted aspects of plagiarism, especially among international students. A major component of the authors’ proposed intervention is the Refero tutorial, which is a web course for students to take as part of the course. We can’t delegating plagiarism education to a web tutorial, but we have to start somewhere. Another resource I found was this large collection of tutorials collected by Macie Hall from Johns Hopkins, but these are focused more at US undergraduates.

Does your institution have a good anti-plagiarism orientation unit? Does it work? When and how do you provide this orientation?

[1] There is much ink to be spilled debating this claim.
[2] I have many mixed feeling about the ethics of TurnItIn, especially after discussions with others.

Salim El Rouayheb’s Shannon Channel: Pulkit Grover at 1300 EST

Salim El Rouayheb has started an exciting new initiative inspired by the TCS+ series. TCS+ is a seminar series on theoretical computer science (plus more) given over Google Hangout so that people across the world can attend the talk (and even ask questions). Nobody has to travel anywhere. Salim’s version is for information theory and he’s calling it Shannon’s Channel. If you’re interested in getting announcements you can sign up for the mailing list.

Salim told me about this at Allerton and I meant to plug it here on the blog earlier but then the semester plus excessive travel ate me. He just sent a reminder yesterday that the inimitable Pulkit Grover will be giving a seminar today (Monday) at 1 PM:

Error-correction and suppression in communication and computing: a tradeoff between information and energy dissipation

Abstract: Information naturally tends to dissipate. This dissipation can be slowed down, but this requires increased energy dissipation. Shannon’s capacity theorem can be interpreted as the first word in this information-energy dissipation tradeoff, but it barely scratches the surface. I will begin with a survey of recent results on minimal energy dissipation for reliable information communication. I will discuss how incorporating energy dissipated in transmitter/receiver circuitry as well as in transmission leads to radically different fundamental limits on information-energy interactions than those obtained by Shannon. I’ll also talk about practical applications in short distance wired and wireless communications.

These techniques can also be applied to obtain fundamental limits to information-energy dissipation for reliable computation using unreliable/noisy components (first considered in [von Neumann ’56]). Recent work on strong data-processing inequality points out the fundamental difficulty in noisy computing: information-dissipation across multiple computation steps. We ask the question: what is the minimum energy-dissipation needed to keep information intact (reliability constant) as the computation proceeds? I’ll describe our novel ENCODED strategy (ENcoded COmputation with DEcoders EmbeddeD) for linear computations on noisy substrates, that outperforms uncoded/repetition-based strategies and keeps error-probability bounded below a constant. The key insight is that for computing in noisy environments, repeated error-suppression (that dissipates energy) is essential to keep information from dissipating. Application to emerging devices and circuit design techniques will also be discussed.

Finally, I’ll talk about a high-density noninvasive biopotential sensing problem, which is closely related to the problem of compressing a Markov source distributedly. Here, energy constraints limit the number of sensors. I’ll discuss how a novel “hierarchical” architecture that contains error-accumulation turns out to have a substantially improved energy-information dissipation tradeoff than simply “compressing innovations” (a strategy known to be suboptimal from a work of Kim and Berger).

The Hangout link is here and the talk will be on YouTube afterwards.

Unfortunately, I have to teach during that time, otherwise I would totally be there, virtually.