Rutgers ECE is hiring!

Faculty Search, Department of Electrical and Computer Engineering, Rutgers University.

The Department of Electrical and Computer Engineering at Rutgers University anticipates multiple faculty openings in the following areas: (i) High-performance distributed computing, including cloud computing and data-intensive computing, (ii) Electronics, advanced sensors and renewable energy, including solar cells and detectors (bio, optical, RF) and, (iii) Bioelectrical engineering.

We are interested in candidates who can combine expertise in these areas with cyber-security, software engineering, devices, embedded systems, signal processing and or communications. In addition, we particularly welcome candidates who can contribute to broader application initiatives such as biomedical and health sciences, smart cities, or sustainable energy.

Outstanding applicants in all areas and at all ranks are encouraged to apply. Suitable candidates may be eligible to be considered for Henry Rutgers University Professorships in Big Data as part of a University Initiative.

Excellent facilities are available for collaborative research opportunities with various university centers such as the Wireless Information Network Laboratory (WINLAB), Microelectronics Research Laboratory (MERL), Institute for Advanced Materials, Devices and Nanotechnology (IAMDN), Center for Advanced Infrastructure and Transportation (CAIT), Rutgers Energy Institute (REI), and the Center for Integrative Proteomics Research, as well as with local industry.

A Ph.D. in a related field is required. Responsibilities include teaching undergraduate and graduate courses and establishing independent research programs. Qualified candidates should submit a CV, statements on teaching and research, and contacts of three references to this website. The review process will start immediately. For full consideration applications must be received by January 15, 2015.

Questions may be directed to:

Athina P. Petropulu
Professor and Chair
Department of Electrical and Computer Engineering
Rutgers University
athinap @ rutgers.edu.

EEO/AA Policy:
Rutgers is an Equal Opportunity / Affirmative Action Employer. Rutgers is also an ADVANCE institution, one of a limited number of universities in receipt of NSF funds in support of our commitment to increase diversity and the participation and advancement of women in the STEM disciplines.

Embargoes and the process of science

Last week I attended the National Academies Keck Futures Initiative (NAKFI) Conference on Collective Behavior, which was really a huge amount of fun. I learned a ton of science (and that I basically know nothing about science — or rather, there is soooo much science to know), and had some very interesting discussion about… stuff. Why am I so cagey? Because the details of discussions at the conference are officially embargoed until the report is issued by the National Academies in spring.

This embargo concept is not entirely new to me, but coming as I do from a tribe that tries to post things on ArXiV as fast as possible, the idea that one should keep mum for a few months feels a bit strange. It makes a lot of sense — people presented posters on work in progress or partial results that they were still working on, and without an embargo there is a potential danger of getting scooped, which could inhibit the free and open sharing of ideas. I certainly felt more comfortable talking about (possibly half-baked) future research ideas, although that was primarily because I didn’t think the ecologist I was conversing with would care as much about stochastic gradient methods.

Embargoes seem to be the norm in Science because of… Science… and Nature… and PNAS. If you have a high-profile article to appear in one of those fancy journals, they want the credit for having chosen it/are the venue in which it appeared. Slapping up your preprint on ArXiV is not on, since it bursts the balloon (although Nature says “[n]either conference presentations nor posting on recognized preprint servers constitute prior publication”). This is newsworthy science, and there’s a relationship between the press, the academic press, and the research community that has been discussed at length.

I came across a blog called Embargo Watch that looks to see how the media/reporters breach the embargoes imposed by the publisher. Indeed, if you look at various embargo policies (even PLoS has one!) show that the embargo thing is really about controlling the news media’s description of the article prior to publication. There’s been a longstanding (un?)healthy debate about the value of embargoes. Personally, I’d prefer to see a someone who studies communication and science studies (like Marisa) do a more critical evaluation of the role of embargoes in enforcing particular constructions and interpretations of the scientific process, the role of power and control, and how researchers propagate and resist the tensions inherent in publishing in high-impact journals.

Regardless, I am following the embargo and keeping quiet while trying to process everything I learned last week. I guess I am glad the ArXiV is there for me — it’s a little more my speed. Actually, it may be a bit too speedy, but it works for now. I think people working in engineering, computer science, and mathematics might find the notion of an embargo somewhat puzzling, as I did. Does this concept even make sense in those fields?

Tracks: Nothing Gets Done

  1. I Thought About You (The Four Freshmen)
  2. Tenere (Bombino)
  3. We Only Come Out At Night (Sugar Stems)
  4. Never Work For Free (Tennis)
  5. Crueler Kind (San Fermin)
  6. One By One (Deep Sea Diver)
  7. Everything (FM Belfast)
  8. I Wish You (CAPSULE)
  9. The Natural World (CYMBALS)
  10. One Second Of Love (Nite Jewel)
  11. Diamond Days (HABITATS)
  12. Turn It Around (Lucius)
  13. Bucolismo (Garota Suecas)
  14. Mercy Mercy Me (Marvin Gaye)
  15. Dark Comedy Morning Show (Open Mike Eagle feat. Toy Light)
  16. A Ho A Hand (FAMY)
  17. I’ll See You In My Dreams (Django Reinhardt Trio)

An exercise in careful misreading

A recent article was passed along to me:

Jane Bambauer, Krishnamurty Muralidhar, and Rathindra Sarathy
Fool’s Gold: An Illustrated Critique of Differential Privacy
Vanderbilt Journal of Entertainment and Technology Law 16(4):701-755.

The article is aimed at the legal community, which has seen in differential privacy a potential technological solution for data privacy issues. The goal of the article is to throw some cold water on some law scholars’ embrace of differential privacy as a solution concept. I’m not a one-method-fixes-all kind of person, but this article is sort of relentlessly negative about differential privacy based solely on a single mechanism: output perturbation. The authors appear to be laboring under the impression that this is really the only way to provide differential privacy, “an assumption that contorts the rest of [their] analysis,” the charge that they level at one proponent of differential privacy.

In the example with which they open the paper, they claim that “even knowing the distribution of noise that is randomly added to each cell, the internist has no hope of interpreting the response. The true values could be almost anything.” While technically true, it’s quite misleading. Indeed, by knowing the distribution, one can create bounds on the accuracy of the answer — this is, contra the authors’ claims, the “tension between utility and privacy” that differential privacy researchers do “toil” with. They manage to explain the statistics fairly reasonably in the middle of the paper but ignore that in the introduction and conclusion in favor of some ascerbic bons mots. Now, perhaps to them, privacy should be an almost-sure guarantee. There is a critique in that: differential privacy can only make probabilistic guarantees, and if your legal standard is stricter than that, then it’s probably not a good way to go. But the misleading rhetoric employed here is meant to stir emotions rather than sway the intellect.

The language in the article is quite florid: “Differential privacy has been rocking the computer science world for over ten years and is fast becoming a crossover hit among privacy scholars and policymakers.” I suppose this sort of prose may be what constitutes scholarly writing in law, but it lacks the measured tones that one might want in more objective criticism. Perhaps they read academic writing in science and engineering in an equally emotional register. They use some strong language to conclude “differential privacy is either not practicable or not novel.” I find such blanket statements both puzzling and vacuous. If you set up a straw-man of what differential privacy is, I suppose you can derive such dichotomies, but is that the best argument one can make?

One thing that comes out of this reading is that most people don’t really appreciate how technology progresses from academic research to practical solutions. Perhaps some legal scholars have overstated the case for differential privacy based on the state of the technology now. But whose to say how things will look a few years down the line? We’ll have better algorithms, different database structures, and different data sharing mechanisms and platforms. Perhaps differential privacy is not ready for prime time, although Google seems to disagree. The authors’ main point (hidden in the in the breathless indignation) is that it’s probably not the best solution for every data sharing problem, a statement with which I can completely agree.

In their effort to discredit differential privacy, the authors ignore both the way in which scientific and academic research works as well as contemporary work that seeks to address the very problems they raise: context-awareness via propose-test-release, methods for setting \epsilon in practical scenarios, and dealing with multiple disclosures via stronger composition rules. They further ignore real technical hurdles in realizing “pure” differential privacy in favor of “illustrations” with the goal of painting proponents of differential privacy as ideologues and hucksters. Of course context and judgement are important in designing query mechanisms and privacy-preserving analysis systems. Furthermore, in many cases microdata have to be released for legal reasons. I think few people believe that differential privacy is a panacea, but it at least provides a real quantifiable approach to thinking about these privacy problems that one can build theories and algorithms around. The key is to figure out how to make those work on real data, and there’s a lot more research to be done on that front.

Fenchel duality, entropy, and the log partition function

[Update: As Max points out in the comments, this is really a specialized version of the Donsker-Varadhan formula, also mentioned by Mokshay in a comment here. I think the difficulty with concepts like these is that they are true for deeper reasons than the ones given when you learn them — this is a special case that requires undergraduate probability and calculus, basically.]

One of my collaborators said to me recently that it’s well known that the “negative entropy is the Fenchel dual of the log-partition function.” Now I know what these words meant, but it somehow was not a fact that I had learned elsewhere, and furthermore, a sentence like that is frustratingly terse. If you already know what it means, then it’s a nice shorthand, but for those trying to figure it out, it’s impenetrable jargon. I tried running it past a few people here who are generally knowledgeable but are not graphical model experts, and they too were unfamiliar with it. While this is just a simple thing about conjugate duality, I think it doesn’t really show up in information theory classes unless the instructor talks a bit more about exponential family distributions, maximum entropy distributions, and other related concepts. Bert Huang has a post on Jensen’s inequality as a justification.

We have a distribution in the exponential family:

p(x; \theta) = \exp( \langle \theta, \phi(x) \rangle - A(\theta) )

As a side note, I often find that the exponential family is not often covered in systems EE courses. Given how important it is in statistics, I think it should be a bit more of a central concept — I’m definitely going to try and work it in to the detection and estimation course.

For the purposes of this post I’m going to assume x takes values in a discrete alphabet \mathcal{X} (say, n-bit strings). The function \phi(x) is a vector of statistics calculated from x, and \theta is a vector of parameters. the function A(\theta) is the log partition function:

A(\theta) = \log\left( \sum_{x} \exp( \langle \theta, \phi(x) \rangle ) \right)

Where the partition function is

Z(\theta) = \sum_{x} \exp( \langle \theta, \phi(x) \rangle )

The entropy of the distribution is easy to calculate:

H(p) = \mathbb{E}[ - \log p(x; \theta) ] = A(\theta) - \langle \theta, \mathbb{E}[\phi(x)] \rangle

The Fenchel dual of a function f(\theta) is the function

g(\psi) = \sup_{\theta} \left\{ \langle \psi, \theta \rangle - f(\theta) \right\}.

So what’s the Fenchel dual of the log partition function? We have to take the gradient:

\nabla_{\theta} \left( \langle \psi, \theta \rangle - A(\theta) \right) = \psi - \frac{1}{Z(\theta)} \sum_{x} \exp( \langle \theta, \phi(x) \rangle ) \phi(x) = \psi - \mathbb{E}[ \phi(x) ]

So now setting this equal to zero, we see that at the optimum \theta^*:

\langle \psi, \theta^* \rangle = \langle \mathbb{E}[ \phi(x) ], \theta^* \rangle

And the dual function is:

g(\psi) = \langle \mathbb{E}[ \phi(x) ], \theta^* \rangle - A(\theta^*) = - H(p)

The standard approach seems to go the other direction by computing the dual of the negative entropy, but that seems more confusing to me (perhaps inspiring Bert’s post above). Since the log partition function and negative entropy are both convex, it seems easier to exploit the duality to prove it in one direction only.

Data: what is it good for? (Absolutely Something): the first few weeks

So Waheed Bajwa and I have been teaching this Byrne Seminar on “data science.” At Allerton some people asked me how it was going and what we were covering in the class. These seminars are meant to be more discussion-based. This is a bit tough for us in particular:

  • engineering classes are generally NOT discussion-based, neither in the US nor in Pakistan
  • it’s been more than a decade since we were undergraduates, let alone 18
  • the students in our class are fresh out of high school and also haven’t had discussion-based classes

My one experience in leading discussion was covering for a theater class approximately 10 years ago, but that was junior-level elective as I recall, and the dynamics were quite a bit different. So getting a discussion going and getting all of the students to participate is, on top of being tough in general, particularly challenging for us. What has helped is that a number of the students in the class are pretty engaged with the ideas and material, and we do in the end get to collectively think about the technologies around us and the role that data plays a bit differently.

What I wanted to talk about in this post was what we’ve covered in the first few weeks. If we offer this class again it would be good to revisit some of the decisions we’ve made along the way, as this is as much a learning process for us as it is for them. A Byrne Seminar meets for 10 times during the semester, so that it will end well before finals. We had some overflow from one topic to the next, but roughly speaking the class went in the following order:

  • Introduction: what is data?
  • Potentials and perils of data science
  • The importance of modeling
  • Statistical considerations
  • Machine learning and algorithms
  • Data and society: ethics and privacy
  • Data visualizaion
  • Project Presentations

I’ll talk a bit more on the blog about this class, what we covered, what readings/videos we ended up choosing, and how it went. I think it would be fun to offer this course again, assuming our evaluations pass muster. But in the meantime, the class is still on, so it’s a bit hard to pass retrospective judgement.

Allerton 2014: hypercontracting interference channels while noisily feeding back what you detected on the graph

Before the expiration window passes, here are few more short takes from Allerton… for some talks I couldn’t take notes because I didn’t get a seat or I missed half the talk shuttling between sessions.

The Gaussian Channel with Noisy Feedback: Near-Capacity Performance Via Simple Interaction
Assaf Ben-Yishai, Ofer Shayevitz
This was a really nice talk by Ofer on trying to get practical codes for AWGN channels with noisy feedback by using the intuition given by the Schalkwijk-Kailath scheme plus some tricks from using the mod operation. This is reminiscent of lattices (which may be an interesting future direction). The SK scheme has a problem with noise accumulation, which they deal with using these mode operations, and can get to errors around 10^(-6) with around 19 rounds, or blocklength 19 at reasonable SNRs. Blocklength is misleading here since there is feedback every symbol. The other catch is that the feedback link must have much higher SNR than the forward link, but this is true in applications such as sensing, where the receiver may be plugged into the wall, but the transmitter may be on a swallowable medical monitoring device.

Point-To-Point Codes for Interference Channels: A Journey Toward High Performance at Low Complexity
Young-Han Kim
Continuing with my UCSD bias, I also wanted to mention Young-Han’s talk, which was on using COTS (commercial, off-the-shelf) coding schemes on the interference channel (in particular, the 2 user IC). He talked about rate splitting approaches and block Markov schemes. Much of this work is with Lele Wang, who may be graduating soon…

Signal Detection on Graphs
Venkatesh Saligrama
This was a hypothesis testing problem where the observations come from nodes on graph. Under the null, they are Gaussian noise, and under the other hypothesis, there is a connected subgraph with an elevated mean. How should we do detection in this scenario? This is a compound hypothesis testing problem because there are (too) many possible connected subgraphs to consider. He gets around this by looking at a convex program parameterized by a measure of the size/shape of the connected component. This is where my notes get messy though, so you might want to look at the paper if it sounds interesting to you…

Hypercontractivity in Hamming Space
Yury Polyanskiy
I’ve hypercontractivity before, and Yury talked about his paper on ArXiV, which is about functions on the binary hypercube. This talk felt more like a tour of results on hypercontractivity and less like a “here is my new result” talk, which I actually appreciated because I felt it tied together ideas well and made me realize how strange the hypercontractivity parameter of an operator is.