# ISIT 2010 : Abbas El Gamal and Te Sun Han

I seem to have gotten all behind on wrapping up the ISIT blogging, so the remainder may be more compressed takes on things. This is not in the compressed sensing world, where the signals are sparse and my comments are meant to reconstruct, but more like lossy compression where $D \to \sigma^2$ (for the Gaussian case).

Abbas El Gamal gave a very nice plenary on “Coding for Noisy Networks” in which he really brought together a lot of different eras and streams of work on network information theory and tried to tie them together in a conceptual framework. There was a nice mix of older and newer results. The thing I liked best about it was that he was very optimistic about making progress in understanding how to communicate in networks from an information-theory perspective, which counteracts the sentiment that I heard that “well, it’s just too messy.”

Te Sun Han gave the Shannon Lecture, of course, and he used his time to give a tutorial on the information spectrum method. I had tried to read the book earlier, and honestly found it a little impenetrable (or rather, I wasn’t sure what I was supposed to use from it). The talk was more like reading the papers — concisely stated, but with a clear line of intuition. I know some people are not a big fan of Shannon Lectures as tutorials, but I think there is also a case to be made that most people are unfamiliar with the information spectrum method. A nice example he gave was to show when the output of an optimal source coder looks “completely random.” Maybe this has been done already, but is there a connection between existing theories of pseudorandomness and the information spectrum method?

My thesis work was about a particular adversarial model from Shannon theory (the arbitrarily varying channel), but I am more broadly interested in how adversarial assumptions can be used to capture uncertainty in modeling or add an element of robustness to designs. I’m not wedded to the AVC, so I like going to see other works which try to address these issues. Among the talks I saw at ISIT, these three captured different aspects of these models.

POLYTOPE CODES AGAINST ADVERSARIES IN NETWORKS
(Oliver Kosut; Cornell University, Lang Tong; Cornell University, David Tse; University of California at Berkeley)
This talk used as an example the “cockroach network,” named thusly because it “does not look like a butterfly.” That line got a lot of laughs. In this model a subset of nodes in the network are controlled by an adversary, who can introduce errors. They show that by using a certain structured code called a polytope code, together with some checking and filtering by internal nodes, the adversaries can be detected/avoided. For some networks (planar) they can show that their code achieves the cut-set bound. The key to the codes working is making any adversarial action result in certain tuples of random variables becoming atypical and hence detectable.

JAMMING GAMES FOR POWER CONTROLLED MEDIUM ACCESS WITH DYNAMIC TRAFFIC
(Yalin Sagduyu; Intelligent Automation Inc./University of Maryland, Randall Berry; Northwestern University, Anthony Ephremides; University of Maryland)
This uses adversarial modeling (in the form of game theory) to model how users in a multiaccess setting contend for resources in the presence of jammers. In the AVC context, La and Anantharam proved some early results on these models. Here the extension was that the jammer(s) do not know whether or not the transmitter will be sending anything in a given slot (hence dynamic traffic). In the case with a single transmitter and a single jammer, the model is that the transmitter has to minimize its energy subject to an average rate constraint (packets are coming in at a certain rate to the transmitter and have to passed along). The jammer has a power constraint and wants the transmitter to maximize its energy. It turns out the that if the jammer doesn’t know the queue state of the transmitter, then it has to be on all the time. They have more complicated extensions to multi-transmitter scenarios.

ON THE DETERMINISTIC CODE CAPACITY REGION OF AN ARBITRARILY VARYING MULTIPLE-ACCESS CHANNEL UNDER LIST DECODING
(Sirin Nitinawarat; University of Maryland)
Sirin talked in the session I was in. For point-to-point AVCs under average error, there is a finite list size $L$ called the symmetrizability of the channel such that for list-decoding with list sizes $\le L$ the capacity is 0 and for larger list sizes the capacity is the randomized coding capacity of the AVC. This work extended this to the multiaccess setting, which is a particularly tricky beast since the capacity region itself is nonconvex. He showed that there is a $U$ such that the capacity region has empty interior if the list size is $\le U$ and that for list sizes $\ge (U+1)^2$ you get back the randomized coding capacity. What is open is whether the list sizes for the two users can be different, so that this gap could be closed, but that problem seems pretty hard to tackle.

(Bikash Kumar Dey; Indian Institute of Technology, Bombay, Sidharth Jaggi; Chinese University of Hong Kong, Michael Langberg; Open University of Israel, Anand Sarwate; University of California, San Diego)
I guess I’ll be shameless and blog about my own paper — we looked at an adversarial setting where the adversary gets to see the transmitted symbols after a delay. We assumed that the delay grew linearly with the blocklength, so that the transmitter gets to act at time $i$ based on $x_1, x_2, \ldots, x_{i - \Delta n}$, where $n$ is the blocklength and $\Delta$ is the delay parameter. Suppose everything is binary and the adversary gets to flip a total of $pn$ bits of the codeword but has to see it causally with delay $\Delta n$. If $\Delta = 1$ the adversary sees nothing and the average-error capacity is $1 - h(p)$ bits. If $\Delta = 0$ the adversary can do a lot worse and the capacity is $1 - 4p$ bits. What we show is that for any $\Delta > 0$ we go back to $1 - h(p)$ bits (and there was much rejoicing). The key is allowing the encoder to randomize, which effectively prevents the adversary from learning anything. Luckily, the decoder can still figure out what is going on (as long as there is some additional slack in the rate), even though it does not share common randomness with the encoder.

# ISIT 2010 : databases

I attended one talk on databases and unfortunately missed the second one, so the writeup I have is based on the paper…

IDENTIFICATION AND LOSSY RECONSTRUCTION IN NOISY DATABASES
(Ertem Tuncel; University of California, Riverside and Deniz Gunduz; Centre Tecnologic de Telecomunicacions de Catalunya)
This was the talk I did attend. The goal was to information-theoritize the notion of compressing databases. You get $M$ i.i.d. individuals with characteristics distributed according to $P_X$ and features $Y$ conditioned on the individuals according to $P_{Y | X}$. The identification problem is this: given a noisy feature vector $Z$ generated from a particular individual $X(i)$ retrieve the index of the user corresponding to $Z$. This paper looked at the case where you want to recover a distorted version of $Y(i)$, so they are trading off identification and distortion. The solution is to code Wyner-Zivly, and the result recover the existing results on the identification problem alone as well as the Wyner-Ziv problem.

A THEORY OF UTILITY AND PRIVACY OF DATA SOURCES
Lalitha Sankar; Princeton University, S. Raj Rajagopalan; HP Labs, H. Vincent Poor; Princeton University
This talk I missed, which is all the more sad because I work on this stuff. But I did get to see a version of it at ITA and also at the IPAM workshop on privacy (as a poster). This is again an attempt to information-theoretize the problem of data disclosure: given a database $d$ of $n$ individuals, release a noisy version of this database. In doing so, they assume that each individual has $K$ features and the $n$ rows are generated iid according to some distribution $P$. So, as they put in the paper, “the primary contribution is demonstrating the equivalence between the database privacy problem and a source coding problem with additional privacy constraints.” The notion of utility is captured by a distortion measure: the encoder maps each database to a string of bits $W$. A user with side information $Z^n$ can attempt to reconstruct the database using the a lossy source code decoder. The notion of privacy by an equivocation constraint:
$\frac{1}{n} H( X^n | W, Z^n) \ge E - \epsilon$
They then set about finding the best utility-privacy (= distortion-equivocation) tradeoff. This is a different model than differential privacy, and rests on a number of information-theoretic assumptions that may be relaxed in the future. For example, assuming individuals are i.i.d., that distributions are known, that the side information sequence is known, and so on would give pause to the differential privacy crowd. Much like the epsilon in differential privacy, the equivocation has a meaning but its syntactic effect is hard to parse. I guess this is why syntactic definitions of privacy like k-anonymity are popular — it’s easy to see what they mean “on the ground.”

# Secrecy and Security in Distributed Storage

Secure Distributive Storage of Decentralized Source Data: Can Interaction Help?

Salim El Rouayheb, Vinod Prabhakaran and Kannan Ramchandran

Salim presented this work on creating a distributed storage system that also has secrecy. As far as I understood, the model is that there are $k$ distributed sources and they want to create a good erasure code in a network: a data collector who accesses any $k$ out of $n$ storage nodes should be able to recover the data. In addition, one storage node is an eavesdropper who gets access to everything communicated to her. The question is to create a distributed code that also has secrecy (i.e. the eavesdropper learns nothing other than what is contained in the nodes she controls) but minimize the communication in the network.

In the model each storage node has a private key and these keys can be used to provide the desired secrecy. The nodes could use interactive protocols of exchanging keys etc, but the main result of the paper is that this does not reduce the required communication compared to a one-way dissemination scheme.

Security in Distributed Storage Systems by Communicating a Logarithmic Number of Bits

by Theodoros K. Dikaliotis, myself and Tracey Ho,

Ted presented our work on security (error correction) for distributed storage systems. We are looking into the problem of finding if a storage node in an $(n,k)$ system is storing incorrect equations. These could be originating from bit-flipping errors in the storage medium or (more likely) if a malicious node was injecting errors during a previous repair in the storage system.

The model is that we assume a trusted verifier that communicates to the nodes and uses the built in redundancy of the code to find where the errors are located. The first observation is in most distributed storage systems the same code is used multiple times to reduce the overhead from storing the coefficients of the code. Instead of sending all the packets, all the storage nodes can project onto the same vector $r$ and only send the projections. The error detection works since the projected data follows the same $(n,k)$ code which can tolerate up to half the minimum distance adversarial errors (under optimal decoding). The problem is that the vector $r$ is very long, around the same order as the stored data. One way around this is to create a pseudorandom $r'$ using a small seed $s$, and project on that. All the storage nodes need to share the small seed $s$ and if we use pseudorandom generators that fool linear functions we can bound the probability of failure of a computationally unbounded adversary who places the errors. This relies  on the pseudorandom generators of Naor & Naor (but for large finite fields).

This trick applies more generally: Assume you can prove some bad event that involves a linear function of  an $n$ length i.i.d. random vector $r$ happens with very small probability $p_{bad}$.

Then instead of using the random vector, you can use a pseudo-random vector $r'$ that is generated from only $\log n$ random bits and still have a bound on the bad event being less than $p_{bad}+ \epsilon$ where $\epsilon$ is the total variation bound obtained from the pseudo-random generator.

On Secure Distributed Data Storage Under Repair Dynamics

by Sameer Pawar, Salim El Rouayheb, Kannan Ramchandran

Sameer presented his work the repair problem under secrecy constraints. The problem is to construct a distributed storage system that stays secure when repairs are taking place. The model is that an eavesdropper can observe the data arriving at $\ell$ nodes in an $(n,k)$ distributed storage system. The static version of this problem (no repairs) is equivalent to the erasure-erasure wiretap-II channel studied in this paper by Arunkumar and Mclaughlin while this paper generalizes this problem under the natural repair dynamics. As far as I understood, there were two results. One was a general bound on the secrecy capacity for any distributed storage system. There is also a surprising separation result:

For distributed storage operating at the minimum bandwidth point (MBR) secrecy and repair can be separated without loss of optimality. The authors show that the combination of using an MDS code to achieve secrecy (by ‘mixing’ the data with random noise keys) and exact Minimum bandwidth regenerating codes that were recently invented in this paper by Rashmi et al. for repair, achieves the secrecy capacity.

Interestingly, the same separation is no longer optimal for other points of the storage-repair tradeoff and as far as I understood the secrecy capacity remains open.

# Codes for Distributed Storage: The repair problem

The first two talks in this session

Explicit and Optimal Exact-Regenerating Codes for the Minimum-Bandwidth Point in Distributed Storage by K. V. Rashmi , Nihar B. Shah , P. Vijay Kumar , Kannan Ramchandran

and

A Flexible Class of Regenerating Codes for Distributed Storage by Nihar B. Shah, K. V. Rashmi, P. Vijay Kumar

where given by Vijay Kumar.  To summarize the results in these two papers (and also

Exact-Repair MDS Codes for Distributed Storage Using Interference Alignment

by Changho Suh and Kannan Ramchandran

that appeared in another session), it is useful to briefly explain the current landscape of the repair problem:

(for more information see the Distributed storage wiki and this survey paper which however do not contain these new results yet.)

Assume a file to be stored, of size $B$ (sometimes also found as $\mathcal{M}$ in the literature) one can use an $(n,k)$ MDS code for storage with maximum reliability: just cut the file into $k$ packets of size $B/k$, encode into $n$ pieces (of the same size $\alpha=B/k$ ) and store each packet into a storage node. The reliability achieved is that any $k$ storage nodes contain enough information to recover the file, which is the best possible. So we can tolerate any $n-k$ failures without losing data. But most of the time we only have one failure. We then want to repair this failure by creating a new encoded packet by communicating as little information as possible from $d$  surviving storage nodes. There are two cases:

1. If we ask that the new packet still forms a good code (i.e. any $k$ out of the new $n$ nodes suffice to recover), the problem is called functional repair and is completely solved and equivalent to multicasting.  The minimum repair bandwidth can be determined by a cut-set bound and we will refer to the achievable region as the cut-set region.

2. If we ask that the new packet formed is exactly identical to the lost one: This problem is called exact repair and is very difficult. It is equivalent to a multicasting network coding problem with intermediate nodes having requests for subsets and therefore prior tools from network coding do not suffice. The repair problem seems close to locally decodable codes but this connection has not been investigated yet.

The amount of information communicated from $d$ surviving nodes is called the repair bandwidth $\gamma=d\beta$. Another issue is that one can reduce this repair bandwidth if the storage per node $\alpha$ is allowed to increase and there is a tradeoff involved.

The two most important points of this tradeoff are called Minimum Storage point (MSR)  (where $\alpha=B/k$ corresponds to MDS codes) and Minimum Bandwidth point (MBR) (where $\alpha$ is larger but the repair bandwidth is minimized).

Clearly exact repair is a strictly harder than functional repair and the cut-set region is still a valid outer bound. A substantial amount of recent work has been focused on understanding the communication required for exact repair. All the results here establish that the cut-set region is achievable for exact repair for the parameters discussed.

Vijay presented a storyline that concluded to the settings which were known before.

This paper had completely solved the problem for the Minimum Bandwidth (MBR) point, for any $(n,k)$, but fixing $d=n-1$.

The first paper (Explicit and Optimal Exact-Regenerating Codes for the Minimum-Bandwidth Point in Distributed Storage) in this session established that the MBR cut-set bound point can be achieved with exact repair for any $d$, corresponding to an arbitrary number $d\leq n-k$ of failed nodes. The second result in this paper is that all the other interior points of the tradeoff curve (except the points of minimum repair and minimum storage) cannot be achieved with linear codes for exact repair. ( Of course this does not preclude the possibility that they can be approached infinitely close by making the code packets smaller and smaller).

In the second paper A Flexible Class of Regenerating Codes for Distributed Storage, the authors generalize the requirements of the distributed storage system: the initial requirement was that any data collector who connects to any $k$ storage nodes (and receives all their stored data ) obtains enough information to recover the file. Here the authors want to enable data collectors who have limited communication with each storage node to reconstruct the file. The only requirement is that the total amount of information communicated is sufficient. This is, of course, a harder requirement on the storage code and the authors characterize the repair bandwidth for functional repair under this new requirement of flexible recovery. The technical part involves graph theoretically bounding the flow for these flexible information flow graphs. The question of exact repair of these flexible regenerating codes naturally arises and it seems to me that previous exact regenerating code constructions do not immediately generalize.

Finally, in another session was the closely related paper

Exact-Repair MDS Codes for Distributed Storage Using Interference Alignment

by Changho Suh and Kannan Ramchandran

Changho presented two new results: both are for the minimum storage point (MSR) and hence correspond to results for MDS codes.

One is that for the low-rate regime $k/n \leq 1/2$ if the number of nodes helping for the repair is sufficiently large $d \geq 2k-1$,  the MSR cut-set point can be achived for exact repair. For this regime the paper presents explicit code constructions that use a small finite fields and build on this paper which established similar results but only when repairing systematic nodes (and not nodes storing parities).

The second result of this paper closes the open problem of MSR exact repair for the high-rate $k/n\geq 1/2$ regime. The paper uses the symbol extension technique introduced in the Cadambe-Jafar paper Interference Alignment and the Degrees of Freedom for the K User Interference Channel to MDS codes that asymptotically approach the cut-set bound.

This paper appeared simultaneously with

Distributed Data Storage with Minimum Storage Regenerating Codes – Exact and Functional Repair are Asymptotically Equally Efficient, by Viveck R. Cadambe, Syed A. Jafar, Hamed Maleki,

Which independently obtain very similar results using the same technique. Note that the symbol extension approach requires vector linear network coding (sub-packetization of each stored packet) and to approach the cut-set bound an enormous amount of subpacketization (and field size) seem to be required. I think it is correct to interpret these results as existence proofs of high-rate regenerating codes approaching the cut-set bound for exact repair, rather than explicit constructions.

I think that the applicability of interference alignment to problems that seem to have nothing to do with wireless channels is quite remarkable. Another related case is the paper

Network Coding for Multiple Unicasts: An Interference Alignment Approach

by A. Das, S. Vishwanath, S. Jafar, A. Markopoulou.

I will blog my notes on this paper (hopefully) soon. Are the slides available anywhere? I took pictures of slides from other talks but my camera run out of battery.

# ISIT 2010: Bethe Entropy and the Edge Zeta Function

Connecting the Bethe Entropy and the Edge Zeta Function of a Cycle Code
by Pascal Vontobel

Pascal talked about a connection of the Bethe entropy to the edge zeta function of a cycle code, building on previous work by Koetter, Li Vontobel and Walker. Starting with the first quantity, as is well known, Yedidia, Freeman and Weiss established that sum-product (if it converges) can only converge to fixed points of the Bethe free energy. The Bethe entropy is the non-linear term added in the ‘average energy’ (optimized by the LP decoder) and is therefore a fundamental quantity to understand sum-product and its connections to LP relaxations. A cycle code is a linear code where all the variables have degree two (and can be represented as edges).

Now what is the second quantity in this connection, the edge zeta function of a cycle code? As far as I understood (and this is probably inaccurate), for each edge (=variable of the code) define an indeterminate quantity $U_{e}$.

For a primitive cycle $c_1$ (without allowing backtracking) of this graph, consisting of edges $e_1, e_2, \cdots$, define a monomial $\gamma(c_1)$ that is the product of terms $\gamma(c_1)= U_{e1}U_{e2}\cdots$ for all the edges of the cycle.

The edge zeta function is a product over all the cycles of these monomials inverted:  $\prod_{\forall c_i} \frac{1}{1-\gamma(c_i)}$.

The edge function can therefore be evaluated at every point of the fundamental polytope by setting the $U_{e}$ appropriately. Previous work by Koetter, Li Vontobel and Walker established that each monomial in the Taylor expansion of this function corresponds to an (unscaled) pseudo-codeword of the cycle code. In particular the powers appearing for each edge (=variable of the code) correspond to re-scaled pseudo-codeword coordinates. As far as I understood, this paper shows that the coefficients of the monomials associated with a given pseudo-codeword have a meaning: their asymptotic growth rate equals to the directional derivative of the (induced) Bethe entropy in the direction of that pseudo-codeword.

I was wondering on the connections to the self-avoiding walks of Dror Weitz [Counting independent sets up to the tree threshold] (also Local approximate inference algorithms)  and the connections of the random walks to the weights of the Arora,Daskalakis, Steurer Message Passing algorithms and improved LP decoding paper.

# ISIT 2010 : gossip and consensus

THE MISSING PIECE SYNDROME IN PEER-TO-PEER COMMUNICATION (Bruce Hajek, Ji Zhu; University of Illinois at Urbana Champaign)
This paper proposes a model for peer-to-peer content distribution in a Bit-Torrent-like setup where there is a seed node and everybody wants to get K pieces of a file held by the seed. Users arrive according to a Poisson process and peers randomly collect and transfer (instantaneously) one piece. The paper provides a stability analysis for this system based on queueing. It’s a cool model, and the talk had some rather amusing moments for those who were there…

WEIGHTED GOSSIP: DISTRIBUTED AVERAGING USING NON-DOUBLY STOCHASTIC MATRICES (Florence Bénézit; Ecole Normale Supérieure-INRIA, Vincent Blondel; UC Louvain, Patrick Thiran; Ecole polytechnique fédérale de Lausanne, John Tsitsiklis; Massachusetts Institute of Technology, Martin Vetterli; Ecole polytechnique fédérale de Lausanne)
Florence presented convergence results for an algorithm based on one-way path averaging. Inspired by the Push-Sum protocol of Kempe et al., she described a one-way method in which a node “gives away” a fraction of its estimate and pushes it along a random direction in the network. The receiving node takes some of the value and passes the rest along — It’s kind of like passing a plate of food around a table while keeping a little (or a lot) for yourself. It’s a cool algorithm, and it works really well in experiments. However, the rate of convergence is still an open question — it seems related to the convergence of non-homogeneous Markov chains.

TIGHT BOUNDS FOR ALGEBRAIC GOSSIP ON GRAPHS (Michael Borokhovich, Chen Avin, Zvi Lotker; Ben Gurion University of the Negev)
This paper was more discrete in nature. There are $n$ nodes in a network and each has a value in a finite field. They pass linear combinations of their symbols around. The goal for every node to learn all the information, or equivalently to gather a full-rank set of equations. Nodes can communicate according to a graph structure — they presented upper and lower bounds of $n d_{\max}$ where $d_{\max}$ is the maximum degree in the graph. They also showed the barbell graph is very very slow.

DISTRIBUTED CONSENSUS WITH FINITE MESSAGING (Debashis Dash, Ashutosh Sabharwal; Rice University)
This was on distributed vertex coloring in which each node gets to know something about the colors in its local neighborhood. This is a bit tough (which they prove), but the authors allow themselves a little slack in that they want to minimize the number of defects (nodes with an adjacent node of the same color), rather than make it $0$. A number of algorithms were presented, many of them based on an initial random assignment followed by a refinement step using the local information.

A NEAR-OPTIMAL ALGORITHM FOR NETWORK-CONSTRAINED AVERAGING WITH NOISY LINKS (Nima Noorshams, Martin J. Wainwright; University of California, Berkeley)
This paper was essentially about packing routes in a “gossip along the way” paradigm — if a node wakes up and starts a path (say horizontally), it can also send a message vertically to trigger path-averaging along parallel paths. This gives a two-phase algorithm and the number of rounds ends up looking like the diameter of the graph. However, the number of one-hop messages scales in the same way. Thus the gain is through parallelization.

# ISIT 2010 : some talks on statistics and probability

For many of the talks I attended I didn’t take notes — partly this is because I didn’t feel expert enough to note things down correctly, and partly because I am

RÉNYI DIVERGENCE AND MAJORIZATION (Tim van Erven; Centrum Wiskunde & Informatica, Peter Harremoës; Copenhagen Business College)
Peter gave a talk on some properties of the Rényi entropy

$D_{\alpha}(P \| Q) = \frac{1}{\alpha - 1} \log \int p^{\alpha} q^{1 - \alpha} d \mu$

where $\mu$ is a dominating measure, and a related geometric quantity, the Lorenz diagram. This is a set of all points in $\mathbb{R}^2$ which are equal to $(\int f dP, \int f dQ)$ for some unit measure function $f$ on [0,1]. It turns out that subset relationships of Lorenz diagrams are the same as majorization relationships between the measures. This is related to something called Markov ordering from a recent paper by Gorban, Gorban, and Judge. This was one of those nice talks which pointed me towards some results and tools that are new to me but that I could not fully understand during the talk.

MINIMAX LOWER BOUNDS VIA F-DIVERGENCES (Adityanand Guntuboyina; Yale University)
This talk was on minimax lower bounds on parameter estimation which can be calculated in terms of generalized $f$ divergences, which are of the form

$D_f(P \| Q) = \int f \left( \frac{dP}{dQ} \right) dQ$

Suppose $T$ is an estimator for some parameter $\theta$. We want universal (over $T$) lower bounds on $\mathbb{P}(T(X^n) \ne \theta)$. This gives a generalization of Fano’s inequality:

$\mathbb{P}(T(X^n) \ne \theta) \ge \inf_Q \sum_{\theta} D_f(P_{\theta} \| Q)$

This has a very geometric feel to it but I had a bit of a hard time following all of the material since I didn’t know the related literature very well (much of it was given in relation to Yang and Barron’s 1999 paper).

MUTUAL INFORMATION SADDLE POINTS IN CHANNELS OF EXPONENTIAL FAMILY TYPE (Todd Coleman; University of Illinois, Maxim Raginsky; Duke University)
In some cases where we have a class of channels and a class of channel inputs, there is a saddle point. The main example is that Gaussian noise is the worst for a given power constraint and Gaussian inputs are the best. Another example is the “bits through queues” This paper gave a more general class of channels of “exponential family type” and gave a more general condition under which you get a saddle point for the mutual information. The channel models are related to Gibbs variational principle, and the arguments had a kind of “free energy” interpretation. Ah, statistical physics rears its head again.

INFORMATION-THEORETIC BOUNDS ON MODEL SELECTION FOR GAUSSIAN MARKOV RANDOM FIELDS (Wei Wang, Martin J. Wainwright, Kannan Ramchandran; University of California, Berkeley)
This was on two problems: inferring edges in a graphical model from iid samples from that model, and inverse covariance estimation. They are similar, but not quite the same. The goal was to prove necessary conditions for doing this; these necessary conditions match the corresponding achievable rates from polytime algorithms. The main result was that you need $n = \Omega( d^2 \log p)$ samples, where $p$ is the number of variables, and $d$ is the max degree in the graphical model. The proof approach was through modeling graph selection as channel coding in which the channel outputs samples from a graphical model chosen by the encoder. If the decoder can identify the model then the problem is solvable. This is analogous to the talk Sriram Vishwanath gave today on matrix completion.

# ISIT 2010 : Anthony Ephremides

“The first phase [of life] you believe in Santa Claus, the second you don’t believe in Santa Clause, and in the third you become Santa Claus.” — Tony Ephremides

Prof. Anthony Ephremides gave the third plenary at ISIT, which included an interim report on the consummation between information theory and network coding and many connections between opera and research. He did a great job I think in explaining the difference between throughput region, stability region, and capacity region (under bursty vs. non-bursty use). These are in increasing order of inclusion. Some interesting tidbits (some new, some not):

• He had a complaint about the way networking handles fading by simply saying the success probability for a packet being received is just $\mathbb{P}( SNR > \gamma)$.
• Under contention, the capacity region may not be convex, unlike in information theory where you can do time sharing.
• For wireless network coding it is important to connect the MAC protocol and scheduling issues as well as change the notion of cut capacities. That is, you shouldn’t replace edges with hyper-edges, because that’s not a good model.
• The information rate is the rate from the payload plus the information in the idleness and the information from the identity of the transmitter. That is, you get data from the actual packet, when the packet was sent, and who is sending the packet.
• Extending many analyses to more than 2 users has not been done.
• Can bursty traffic capacity be larger than non-bursty? The NNN (*) community says that would be a contradiction, because you can always then emulate bursty traffic. But this is unfair because idling the transmitter to build up packets and hence create artificial bursts is not allowed in the problem formulation, so there is no contradiction.
• Relaying is good from a network perspective because it can partially enable a first-come first-serve (FCFS) discipline. So relays bridge a gap to the optimal scheduling policy. This is different than the information theory notion of cooperation yielding diversity gain.
• Multicast throughput is an interesting thing to think about more rigorously for the future.
• His prescription : the information theory has to apply its tools more broadly to networking problems and to unorthodox problems. The networking community should use more rigorous modeling and analysis methods.

(*) NNN = Nagging nabobs of negativism (really it should be “nattering nabobs of negativism,” Safire’s famous alliterative phrase wielded with such gusto by Spiro Agnew).

# ISIT 2010 : Michael Jordan

Perhaps malapropos for the NBA Finals, Prof. Michael Jordan gave the first plenary talk at ISIT. It was a great overview of nonparametric Bayesian modeling. In particular, he covered his favorite Chinese restaurant process (also known as the Pitman-Yor stick-breaking process), hierarchical Dirichlet priors, and all the other jargon-laden elements of modeling. At the end he covered some of the rather stunning successes of this approach in applications with lots of data to learn from. What was missing for me was a sense of how these approaches worked in the data-poor regime, so I asked a question (foolishly) about sample complexity. Alas, since that is a “frequentist” question and Jordan is a “Bayesian,” I didn’t quite get the answer to the question I was trying to ask, but that’s what happens when you don’t phrase things properly.

One nice thing that I learned was the connection to Kingman’s earlier work on characterizing random measures via non-homogeneous Poisson processes. Kingman has been popping up all over the place in my reading, from urn processes to exchangeable partition processes (also known as paintbox processes). When I get back to SD, it will be back to the classics for me!