# Call for Papers: T-SIPN Special Issue on Distributed Information Processing in Social Networks

IEEE Signal Processing Society
IEEE Transactions on Signal and Information Processing over Networks
Special Issue on Distributed Information Processing in Social Networks

Over the past few decades, online social networks such as Facebook and Twitter have significantly changed the way people communicate and share information with each other. The opinion and behavior of each individual are heavily influenced through interacting with others. These local interactions lead to many interesting collective phenomena such as herding, consensus, and rumor spreading. At the same time, there is always the danger of mob mentality of following crowds, celebrities, or gurus who might provide misleading or even malicious information. Many efforts have been devoted to investigating the collective behavior in the context of various network topologies and the robustness of social networks in the presence of malicious threats. On the other hand, activities in social networks (clicks, searches, transactions, posts, and tweets) generate a massive amount of decentralized data, which is not only big in size but also complex in terms of its structure. Processing these data requires significant advances in accurate mathematical modeling and computationally efficient algorithm design. Many modern technological systems such as wireless sensor and robot networks are virtually the same as social networks in the sense that the nodes in both networks carry disparate information and communicate with constraints. Thus, investigating social networks will bring insightful principles on the system and algorithmic designs of many engineering networks. An example of such is the implementation of consensus algorithms for coordination and control in robot networks. Additionally, more and more research projects nowadays are data-driven. Social networks are natural sources of massive and diverse big data, which present unique opportunities and challenges to further develop theoretical data processing toolsets and investigate novel applications. This special issue aims to focus on addressing distributed information (signal, data, etc.) processing problems in social networks and also invites submissions from all other related disciplines to present comprehensive and diverse perspectives. Topics of interest include, but are not limited to:

• Dynamic social networks: time varying network topology, edge weights, etc.
• Social learning, distributed decision-making, estimation, and filtering
• Consensus and coordination in multi-agent networks
• Modeling and inference for information diffusion and rumor spreading
• Multi-layered social networks where social interactions take place at different scales or modalities
• Resource allocation, optimization, and control in multi-agent networks
• Modeling and strategic considerations for malicious behavior in networks
• Social media computing and networking
• Data mining, machine learning, and statistical inference frameworks and algorithms for handling big data from social networks
• Data-driven applications: attribution models for marketing and advertising, trend prediction, recommendation systems, crowdsourcing, etc.
• Other topics associated with social networks: graphical modeling, trust, privacy, engineering applications, etc.

Important Dates:

Manuscript submission due: September 15, 2016
First review completed: November 1, 2016
Revised manuscript due: December 15, 2016
Second review completed: February 1, 2017
Final manuscript due: March 15, 2017
Publication: June 1, 2017

Guest Editors:

Zhenliang Zhang, Qualcomm Corporate R&D (zhenlian@qti.qualcomm.com)
Wee Peng Tay, Nanyang Technological University (wptay@ntu.edu.sg)
Moez Draief, Imperial College London (m.draief@imperial.ac.uk)
Xiaodong Wang, Columbia University (xw2008@columbia.edu)
Edwin K. P. Chong, Colorado State University (edwin.chong@colostate.edu)
Alfred O. Hero III, University of Michigan (hero@eecs.umich.edu)

# Mathematical Tools of Information-Theoretic Security Workshop: Days 2-3

I took sketchier notes as the workshop progressed, partly due to the ICASSP deadline, but also because jet lag started to hit me. The second day was a half day, which started with Zhenjie Zhang giving a tutorial on differential privacy from a databases/data mining perspective and my talk on more machine learning aspects. In between us was a talk by Ben Smyth on building automatic verification for security protocols. Basically you write the protocol as a program and then the ProVerif verifier will go and try to break your protocol. As an example, it can automatically find/generate a man-in-the-middle attack if one exists. I thought it was pretty neat, especially after having recently talked to someone about automatic proof systems. It’s based on something called the applied pi calculus, which I did not understand at all, but hey, I learned something new, which was great. The last two talks of the day were by Lalitha Sankar and Mari Kobayashi. Lalitha talked about mutual information based measures of privacy leakage in an interactive communication setting that is the information-theoretic analogue of communication complexity models in CS. Mari talked about the broadcast channel with state feedback. This is trying to find secure analogues of these opportunistic multicast settings where you need to also generate a secret key.

The last day was on quantum! I learned a lot and took few notes, unfortunately. Andreas Winter gave a tutorial on quantum (the slides for most talks are online and his are as well) and Ciara Morgan discussed the challenges in proving a strong converse for the the capacity of quantum channels. Damian Markham talked about secret sharing in quantum systems. Masahito Hayashi gave a very densely-packed talk surveying a large number of results based on secure randomness extraction and hash functions using Rényi information measures. I think privacy amplification is really interesting but I think I need a tutorial on it before I can really get the research results. The last non-overview talk I have notes on was by David Elkouss (apologies to the remaining speakers): this was a really interesting presentation on how to decide which of two channels is better from a quantum communication sense. The slides are a little engimatic, but the papers are online.

Shlomo Shamai made it to the last day of the workshop (the intersection with High Holidays was unfortunate) — he talked about the layered secrecy view of the broadcast channel: rather than thinking only of the secret message as carrying information, one can think of certain layers (c.f. superposition coding) as being secured based on the channel to the non-legitimate receiver. For example, in a degraded broadcast channel, the strong receiver’s message can sometimes be thought of as secret from the weak receiver. This leads to a raft of models and setups based on who wants to keep what secret from whom, shedding some light on standard superposition, rate splitting, binning, and embedding constructions. The talk was largely based on a paper in the current issues of the Proceedings of the IEEE.

All in all, this was a really great workshop, and the organizers were very generous in the organization.

# Postdoc in privacy and security at Imperial College London

Denis Gündüz is looking for a postdoctoral researcher in the areas of privacy and security in cyber-physical systems, particularly for smart metering applications in smart grids. The position is in the Intelligent Systems and Networks Group within the Electrical and Electronic Engineering Department of Imperial College London.

Previous research experience and a strong track record in information theory, signal processing, and/or optimisation theory is required. This position will be supported through an international project, and will provide an excellent opportunity to work within an interdisciplinary team spanning top European institutions: Imperial College London, KTH, ETHZ and INRIA.
The position is available immediately for one year, with a potential to be extended another year depending on candidate’s performance.

Contact Dr. Gündüz directly if interested.

# Re-identification from microbiomes

A (now not-so-recent) paper by Homer et al. made a splash by showing that one could take a DNA sample from a person and detect whether they were part of the Human Genome Project (HGP) based on looking at the SNP variations from that individual together with the reported allele variations in the HGP data. More recently, a paper in PNAS by Franzosa et al. showed reidentification of individuals in the Human Microbiome Project.

Color me unsurprised. Given the richness of the data, from a purely informational point of view it seems pretty clear that people should be identifiable. As with many machine learning problems, however, the secret is in the feature encoding. Many approaches to comparing metagenomes, especially for bacterial ecologies, try to assess the variability in the population of bacteria, perhaps through mapping the to known strains. As mentioned in the Methods section, “reads were additionally mapped to a database of 649 microbial reference genomes using the Burrows-Wheeler aligner.” However, in addition to these mapping statistics, they used a few other more complicated features to help gain some additional robustness in their identification procedure.

Somehow being able to be identified by your microbiome seems less scary than being able to be identified by your genome, perhaps because we have a sense that genes are more “determining” than microbiomes. After all, you could get a fecal transplant and change your gut flora significantly. Is it the same as burning off your fingerprints? Probably not. But perhaps in the future, perpetrators of certain campus shenanigans may be easier to catch.

# Signal boost: Postdoc in Privacy at Penn State

Sofya Raskhodnikova and Adam Smith are looking to fill a postdoc position at Penn State for a multi-year project on privacy, streaming and learning.

Qualifications: Ph.D., with expertise in the theoretical foundations of at least one of the research areas (algorithms, machine learning and statistics, data privacy). Willingness to work on a cross-disciplinary project.

Duration and compensation: At least one year, renewable. Start date is
negotiable, though we slightly prefer candidates starting fall 2015. Salary is competitive.

Applicants should email a CV, short research statement and list of references directly to the project leaders ({asmith,sofya}@cse.psu.edu) with “postdoc” in the subject line.

Location: The university is located in the beautiful college town of
State College in the center of Pennsylvania. The State College area has 130,000 inhabitants and offers a wide variety of cultural and outdoor recreational activities. The university offers outstanding events from collegiate sporting events to fine arts productions. Many major population centers on the east coast (New York, Philadelphia, Pittsburgh, Washington D.C., Baltimore) are only a few hours’ drive away and convenient air services to several major hubs are operated by three major airlines out of State College.

Penn State is an equal opportunity employer. We encourage applications from underrepresented minorities.

# ITA 2015: quick takes

Better late than never, I suppose. A few weeks ago I escaped the cold of New Jersey to my old haunts of San Diego. Although La Jolla was always a bit fancy for my taste, it’s hard to beat a conference which boasts views like this:

A view from the sessions at ITA 2015

I’ll just recap a few of the talks that I remember from my notes — I didn’t really take notes during the plenaries so I don’t have much to say about them. Mostly this was due to laziness, but finding the time to blog has been challenging in this last year, so I think I have to pick my battles. Here’s a smattering consisting of

$\{ \mathrm{talks\ attended} \} \cap \{ \mathrm{talks\ with\ understandable\ notes} \}$

(Information theory)
Emina Soljanin talked about designing codes that are good for fast access to the data in distributed storage. Initial work focused on how to repair codes under disk failures. She looked at how easy it is to retrieve the information afterwords to guarantee some QoS for the storage system. Adam Kalai talked about designing compression schemes that work for an “audience” of decoders. The decoders have different priors on the set of elements/messages so the idea is to design an encoder that works for this ensemble of decoders. I kind of missed the first part of the talk so I wasn’t quite sure how this relates to classical work in mismatched decoding as done in the information theory world. Gireeja Ranade gave a great talk about defining notions of capacity/rate need to control a system which as multiplicative uncertainty. That is, $x[n+1] = x[n] + B[n] u[n]$ where $B[n]$ has the uncertainty. She gave a couple of different notions of capacity, relating to the ratio $| x[n]/x[0] |$ — either the expected value of the square or the log, appropriately normalized. She used a “deterministic model” to give an explanation of how control in this setting is kind of like controlling the number of significant bits in the state: uncertainty increases this and you need a certain “amount” of control to cancel that growth.

(Learning and statistics)
I learned about active regression approaches from Sivan Sabato that provably work better than passive learning. The idea there is do to use a partition of the X space and then do piecewise constant approximations to a weight function that they use in a rejection sampler. The rejection sampler (which I thought of as sort of doing importance sampling to make sure they cover the space) helps limit the number of labels requested by the algorithm. Somehow I had never met Raj Rao Nadakuditi until now, and I wish I had gotten a chance to talk to him further. He gave a nice talk on robust PCA, and in particular how outliers “break” regular PCA. He proposed a combination of shrinkage and truncation to help make PCA a bit more stable/robust. Laura Balzano talked about “estimating subspace projections from incomplete data.” She proposed an iterative algorithm for doing estimation on the Grassmann manifold that can do subspace tracking. Constantine Caramanis talked about a convex formulation for mixed regression that gives a guaranteed solution, along with minimax sample complexity bounds showing that it is basically optimal. Yingbin Liang talked about testing approaches for understanding if there is an “anomalous structure” in a sequence of data. Basically for a sequence $Y_1, Y_2, \ldots, Y_n$, the null hypothesis is that they are all i.i.d. $\sim p$ and the (composite) alternative is that there an interval of indices which are $\sim q$ instead. She proposed a RKHS-based discrepancy measure and a threshold test on this measure. Pradeep Ravikumar talked about a “simple” estimator that was a “fix” for ordinary least squares with some soft thresholding. He showed consistency for linear regression in several senses, competitive with LASSO in some settings. Pretty neat, all said, although he also claimed that least squares was “something you all know from high school” — I went to a pretty good high school, and I don’t think we did least squares! Sanmi Koyejo talked about a Bayesian devision theory approach to variable selection that involved minimizing some KL-divergence. Unfortunately, the resulting optimization ended up being NP-hard (for reasons I can’t remember) and so they use a greedy algorithm that seems to work pretty well.

(Privacy)
Cynthia Dwork gave a tutorial on differential privacy with an emphasis on the recent work involving false discovery rate. In addition to her plenary there were several talks on differential privacy and other privacy measures. Kunal Talwar talked about their improved analysis of the SuLQ method for differentially private PCA. Unfortunately there were two privacy sessions in parallel so I hopped over to see John Duchi talk about definitions of privacy and how definitions based on testing are equivalent to differential privacy. The testing framework makes it easier to prove minimax bounds, though, so it may be a more useful view at times. Nadia Fawaz talked about privacy for time-series data such as smart meter data. She defined different types of attacks in this setting and showed that they correspond to mutual information or directed mutual information, as well as empirical results on a real data set. Raef Bassily studied a estimation problem in the streaming setting where you want to get a histogram of the most frequent items in the stream. They reduce the problem to one of finding a “unique heavy hitter” and develop a protocol that looks sort of like a code for the MAC: they encode bits into a real vector, had noise, and then add those up over the reals. It’s accepted to STOC 2015 and he said the preprint will be up soon.

Posting a hodgepodge of links after a rather wonderful time hiking and camping, solving puzzles, and the semester starting all together too soon for my taste.

[Trigger warning] More details on Walter Lewin’s actions.

Hanna Wallach’s talk at the NIPS Workshop on fairness.

Reframing Science’s Diversity Challenge by trying to move beyond the pipeline metaphor.

An essay by Daniel Solove on privacy (I’d recommend reading his books too but this is shorter). He takes on the “nothing to hide” argument against privacy.

I don’t like IPAs that much, but this lawsuit about lettering seems like a big deal for the craft beer movement.

I’ve always been a little skeptical of Humans of New York, but never was sure why. I think this critique has something to it. Not sure I fully agree but it does capture some of my discomfort.

Judith Butler gave a nice interview where she talks a bit about why “All Lives Matter,” while true, is not an appropriate rhetorical strategy: “If we jump too quickly to the universal formulation, ‘all lives matter,’ then we miss the fact that black people have not yet been included in the idea of ‘all lives.’ That said, it is true that all lives matter (we can then debate about when life begins or ends). But to make that universal formulation concrete, to make that into a living formulation, one that truly extends to all people, we have to foreground those lives that are not mattering now, to mark that exclusion, and militate against it.”

A nice essay on morality and progress with respect to Silicon Valley. Techno-utopianism running amok leads to bad results: “Silicon Valley’s amorality problem arises from the implicit and explicit narrative of progress companies use for marketing and that people use to find meaning in their work. By accepting this narrative of progress uncritically, imagining that technological change equals historic human betterment, many in Silicon Valley excuse themselves from moral reflection.”