# mergetex script updated

I ported the mergetex script from perl to python and posted it on github, for those who are interested. It scans recursively though a large LaTeX project and generates a single .tex file suitable for uploading to ArXiV (with comments stripped out). I probably should have have branched from Manu’s though, oops. Ah well. There are lots of versions of this thing out there, so no claims to elegance here.

A related question (which I gave up on after a short attack) was how to generate a regular expression that matches the first part of a line of LaTeX before a comment. The trick is that \% is escaped, so you want

There is a 95\% chance that I am bad at regexps % DUH\n

to return

There is a 95\% chance that I am bad at regexps %

and

There is a 95\% chance that I am bad at regexps % DUH more like 99\%\n

to return

There is a 95\% chance that I am bad at regexps %

but

There is a 95\% chance that I am bad %at regexps % DUH more like 99\%\n

to return

There is a 95\% chance that I am bad%

Some sort of negative lookbehind is needed but I got confused and decided to do something less efficient.

# Postdoc opportunity at Aalto University

Aalto University is a new university with over a century of experience. Created from a high-profile merger between three leading universities in Finland – the Helsinki School of Economics, Helsinki University of Technology and the University of Art and Design Helsinki – Aalto University opens up new possibilities for strong multidisciplinary education and research. The university has 20 000 students and a staff of 5,000 including 350 professors.

The stochastics research group at the Department of Mathematics and Systems Analysis is currently undergoing a period of regeneration, as new associate and assistant professors have been employed to replace previous long-term faculty, and several new young researchers are being recruited with the aim of significant growth. To strengthen this line of development, we are now seeking to hire a postdoctoral researcher with a PhD in mathematics or a related area.

The postdoctoral researcher will carry out research in collaboration with the stochastics research group, with a small amount of teaching duties included. The salary is competitive, based on experience and qualifications, and includes occupational health and a travel budget for international conferences and workshops. The position is for one year with a possible extension for another year, starting preferably in September 2014 and no later than January 2015.

Further information and instructions for applying:

Please feel free to forward this message to colleagues and potential candidates. The application deadline is 13 June 2014.

# Quals! What are they good for? Absolutely…

… nothing? So sayeth some in the business. Bill Gasarch wrote a post up last week about the “point” of the qualifying exam in which he says the two points of the process are to get students out of the program who may not be able to finish, and to make sure students are “well rounded.” While this is nice from an administrative/pedagogical point of view, I think they first question to ask is “what can you measure with a qualifying process?”

In the Rutgers ECE department we have a 4-part exam — three oral 1-hour exams on different subjects (I got to do one exam in Communications with my colleague David Daut) and a written math exam. Students have to meet an average score threshold to pass, otherwise they have to retake some (or all) of the exams. Many take the exam in their third year, and the result of this intensive assay is that many students basically don’t do much research for the semester before their exams. In general, the material is a little more advanced, I think, than the Berkeley prelim, and focuses a but more on concepts encountered in graduate school. The exam also measure how “comfortable” you are with basic graduate material, but in a much more drawn-out manner, and later in graduate school.

Neither exam particularly measures your ability to do research, and instead focuses on core competence. The first criterion of Gasarch’s is really about whether someone will be able to complete a PhD in a reasonable amount of time; this is, in fact, a very difficult thing to measure. One approach is to say that it’s all up to the advisor, but some oversight is necessary, and having the other faculty in the department also evaluate students outside the classroom is desirable from an academic community standpoint. The CS theory exam of presenting a paper actually seems better in this regard, but then its entirely about research. Furthermore, I think that for many students, reading and understanding a paper (not of their choosing) might be a tall order very early in their career. How can we really assess whether someone would be best served getting an MS vs. a PhD?

I do think core competence is important. Getting a PhD in Electrical Engineering should mean that you have basic knowledge about some topic within EE. The alternative is that you can do research on anything, as long as the committee signs off on it, and it counts as EE. I’m not a fan of boundary drawing, but there is a value to getting students to integrate some of their undergraduate knowledge across classes (as opposed to taking the final and forgetting it), because this process of integration is also an important research skill. But the shifting nature of research areas means the cluster of topics most relevant for a solid foundation may not slot neatly into DSP/Comm/Solid State, etc. Is the problem only our disciplinary boundaries?

How does the qualifying process work in your department? Is it good, bad, or ugly?

# Documents needed for a Chinese Tourist Visa

I just got a tourist visa (L) to go to ICML from the Chinese Consulate in New York, and it was significantly less difficult than I had been led to believe. However, the information on the website is a little hard to parse (since there are so many visa classes), so here’s a quick list of the documents you need:

• Passport with at least six months of remaining validity and blank page for the visa.
• A photocopy of passport photo and information page.
• Copy of the application form with 1 passport photo attached.
• Printout of your plane ticket confirmation and hotel booking confirmation \textbf{OR} a formal invitation letter. The hotel need not be for the entire duration of your trip.

You will need to make two trips, the first to drop off the forms and your passport, and the second (4 days later) to pick up your passport and pay (currently $140). A tip that holds for New York (and probably elsewhere): don’t go right at opening time, since there is a huge rush. The line tends to thin between 10 and 11, and your wait will be significantly shorter. Depending on how annoyed the guards are, they may yell at you for using your phone (technically not allowed), so bring a magazine or some knitting or something in case the wait is long. When you pay, it’s MasterCard/Visa only, or cashiers check/money order. # CFP: JSTSP Special Issue on Privacy IEEE Signal Processing Society IEEE Journal of Selected Topics in Signal Processing Special Issue on Signal and Information Processing for Privacy Aims and Scope: There has been a remarkable increase in the usage of communications and information technology over the past decade. Currently, in the backend and in the cloud, reside electronic repositories that contain an enormous amount of information and data associated with the world around us. These repositories include databases for data-mining, census, social networking, medical records, etc. It is easy to forecast that our society will become increasingly reliant on applications built upon these data repositories. Unfortunately, the rate of technological advancement associated with building applications that produce and use such data has significantly outpaced the development of mechanisms that ensure the privacy of such data and the systems that process it. As a society we are currently witnessing many privacy-related concerns that have resulted from these technologies—there are now grave concerns about our communications being wiretapped, about our SSL/TLS connections being compromised, about our personal data being shared with entities we have no relationship with, etc. The problems of information exchange, interaction, and access lend themselves to fundamental information processing abstractions and theoretical analysis. The tools of rate-distortion theory, distributed compression algorithms, distributed storage codes, machine learning for feature identification and suppression, and compressive sensing and sampling theory are fundamental and can be applied to precisely formulate and quantify the tradeoff between utility and privacy in a variety of domains. Thus, while rate-distortion theory and information-theoretic privacy can provide fundamental bounds on privacy leakage of distributed data systems, the information and signal processing techniques of compressive sensing, machine learning, and graphical models are the key ingredients necessary to achieve these performance limits in a variety of applications involving streaming data, distributed data storage (cloud), and interactive data applications across a number of platforms. This special issue seeks to provide a venue for ongoing research in information and signal processing for applications where privacy concerns are paramount. Topics of Interest include (but are not limited to): • Signal processing for information-theoretic privacy • Signal processing techniques for access control with privacy guarantees in distributed storage systems • Distributed inference and estimation with privacy guarantees • Location privacy and obfuscation of mobile device positioning • Interplay of privacy and other information processing tasks • Formalized models for adversaries and threats in applications where consumer and producer privacy is a major concern • Techniques to achieve covert or stealthy communication in support of private communications • Competitive privacy and game theoretic formulations of privacy and obfuscation Important Dates: Manuscript submission due: October 1, 2014 First review completed: December 15, 2014 Revised manuscript due: February 1, 2015 Second review completed: March 15, 2015 Final manuscript due: May 1, 2015 Publication date: October 2015 Prospective authors should visit the JSTSP homepage for information on paper submission. Manuscripts should be submitted using Manuscript Central. # yet more not-so-recent hits from ArXiV Some shorter takes on these papers, some of which I should read in more detail later. I figure I’ll use the blog for some quick notes and to see if any readers have any comments/ideas about these: Differentially Private Convex Optimization with Piecewise Affine Objectives (Shuo Han, Ufuk Topcu, George J. Pappas) — arXiv:1403.6135 [math.OC]. The idea here is to look at minimizing functions of the form $f(x) = \max_{i = 1,2, \ldots, m} \{ a_i^{\top} x + b_i \}$ subject to $x$ belonging to some convex polytope $\mathcal{P}$. This is a bit different than the kind of convex programs I’ve been looking at (which are more ERM-like). Such programs occur often in resource allocation problems. Here the private information of users are the offsets $b_i$. They propose a number of methods for generating differentially private approximations to this problem. Analyzing the sensitivity of this optimization is tricky, so they use an upper bound based on the diameter of the feasible set$\mathcal{P}\$ to find an appropriate noise variance. The exponential mechanism also gives a feasible mechanism, although the exact dependence of the suboptimality gap on $\epsilon$ is unclear. They also propose a noisy subgradient method where, instead of using SGD, they alter the sampling distribution using the exponential mechanism to choose a gradient step. Some preliminary experiments are also given (although none exploring the dependence on $\epsilon$, which would also be very interesting)!

Assisted Common Information with an Application to Secure Two-Party Sampling (Vinod M. Prabhakaran, Manoj M. Prabhakaran) — arXiv:1206.1282 [cs.IT]. This is the final version of the journal version of a few conference papers that Vinod and Manoj have done on an interesting variant of the Gács-Körner problem. The motivation is from secure multiparty computation — the problem also touches on some work Vinod and I started but is sadly languishing due to the utter overwhelmingness of starting a new job. Hopefully I can get back to it this summer.

Analysis of Distributed Stochastic Dual Coordinate Ascent (Tianbao Yang, Shenghuo Zhu, Rong Jin, Yuanqing Lin) — arXiv:1312.1031 [cs.DC]. The title pretty much sums it up. I’m interested in looking a bit more at the analysis method, since I had a similar algorithm bouncing around my head that I would like to analyze. The main idea is also update the primal variables to achieve a speedup/use a larger step size.

Convergence of Stochastic Proximal Gradient Algorithm (Lorenzo Rosasco, Silvia Villa, Bang Công Vũ) — arXiv:1403.5074 [math.OC]. This is a similar setup as my last post, with a convex objective that has a smooth and non-smooth component. They show convergence in expectation and almost surely. The key here is that they show convergence in an infinite-dimentionsal Hilbert space instead of, say, $\mathbb{R}^d$.

/

# Gaussian channel capacity and angles

There’s this nice calculation in a paper of Shannon’s on optimal codes for Gaussian channels which essentially provides a “back of the envelope” way to understand how noise that is correlated with the signal can affect the capacity. I used this as a geometric intuition in my information theory class this semester, but when I described it to other folks I know in the field, they said they hadn’t really thought of capacity in that way. Perhaps it’s all of the AVC-ing I did in grad school.

Suppose I want to communicate over an AWGN channel

$Y = X + Z$

where $Z \sim \mathcal{N}(0,N)$ and $X$ satisfies a power constraint $P$. The lazy calculation goes like this. For any particular message $m$, the codeword $X^n(m)$ is going to be i.i.d. $\mathcal{N}(0,P)$, so with high probability it has length $\sqrt{n P}$. The noise is independent and $\mathbb{E}[ X Z ] = 0$ so $\langle X^n, Z^n \rangle \approx 0$ with high probability, so $Z^n$ is more-or-less orthogonal to $X^n(m)$ with high probability and it has length $\sqrt{nN}$ with high probability. So we have the following right triangle:

Geometric picture of the AWGN channel

Looking at the figure, we can calculate $\sin \theta$ using basic trigonometry:

$\sin \theta = \sqrt{ \frac{N}{N + P} }$,

so

$\log \frac{1}{\sin \theta} = \frac{1}{2} \log \left( 1 + \frac{P}{N} \right)$,

which is the AWGN channel capacity.

We can do the same thing for rate-distortion (I learned this from Mukul Agarwal and Anant Sahai when they were working on their paper with Sanjoy Mitter). There we have Gaussian source $X^n$ with variance $\sigma^2$, distortion $D$ and a quantization vector $\hat{X}^n$. But now we have a different right triangle:

Geometry of the rate-distortion problem

Here the distortion is the “noise” but it’s dependent on the source $X^n$. The “test channel” view says that the quantization $\hat{X}^n$ is corrupted by independent (approximately orthogonal) noise to form the original source $X^n$. Again, basic trigonometry shows us

$\log \frac{1}{\sin \theta} = \frac{1}{2} \log \left( \frac{\sigma^2}{D} \right)$.

Turning back to channel coding, what if we have some intermediate picture, where the noise slightly negatively correlated with the signal, so $\mathbb{E}[ X Z ] = - \rho$? Then the cosine of the angle between $X$ and $Z$ in the picture is $\frac{\rho}{\sqrt{NP}}$ and we have a general triangle like this:

Geometry for channels with correlated noise

Where we’ve calculated the length of $Y^n$ using the law of cosines:

$\|Y^n\|^2 = n (P + N) - 2 n \sqrt{ N P } \cdot \frac{\rho}{\sqrt{NP}}$.

So now we just need to calculate $\theta$ again. The cosine is easy to find:

$\cos \theta = \frac{ P + N - 2 \rho + P - N }{ 2 \sqrt{ P (P + N - 2 \rho) } } = \frac{P - \rho}{ \sqrt{P (P + N - 2 \rho) } }$.

Then solving for the sine:

$\sin^2 \theta = 1 - \frac{ (P - \rho)^2 }{ P( P + N - 2 \rho) }$

and applying our formula, for $\rho < \sqrt{PN}$,

$\log \frac{1}{\sin \theta} = \frac{1}{2} \log\left( \frac{ P ( P + N - 2 \rho) }{ P (P + N - 2 \rho) - (P - \rho)^2 } \right) = \frac{1}{2} \log\left( \frac{ P (P + N - 2 \rho) }{ PN - \rho^2 } \right)$

If we plug in $\rho = 0$ we get back the AWGN capacity and if we plug in $\rho = N$ we get the rate distortion function, but this formula gives the capacity for a range of correlated noise channels.

I like this geometric interpretation because it's easy to work with and I get a lot of intuition out of it, but your mileage may vary.