# mergetex script updated

I ported the mergetex script from perl to python and posted it on github, for those who are interested. It scans recursively though a large LaTeX project and generates a single .tex file suitable for uploading to ArXiV (with comments stripped out). I probably should have have branched from Manu’s though, oops. Ah well. There are lots of versions of this thing out there, so no claims to elegance here.

A related question (which I gave up on after a short attack) was how to generate a regular expression that matches the first part of a line of LaTeX before a comment. The trick is that \% is escaped, so you want

There is a 95\% chance that I am bad at regexps % DUH\n

to return

There is a 95\% chance that I am bad at regexps %

and

There is a 95\% chance that I am bad at regexps % DUH more like 99\%\n

to return

There is a 95\% chance that I am bad at regexps %

but

There is a 95\% chance that I am bad %at regexps % DUH more like 99\%\n

to return

There is a 95\% chance that I am bad%

Some sort of negative lookbehind is needed but I got confused and decided to do something less efficient.

# Postdoc opportunity at Aalto University

Aalto University is a new university with over a century of experience. Created from a high-profile merger between three leading universities in Finland – the Helsinki School of Economics, Helsinki University of Technology and the University of Art and Design Helsinki – Aalto University opens up new possibilities for strong multidisciplinary education and research. The university has 20 000 students and a staff of 5,000 including 350 professors.

The stochastics research group at the Department of Mathematics and Systems Analysis is currently undergoing a period of regeneration, as new associate and assistant professors have been employed to replace previous long-term faculty, and several new young researchers are being recruited with the aim of significant growth. To strengthen this line of development, we are now seeking to hire a postdoctoral researcher with a PhD in mathematics or a related area.

The postdoctoral researcher will carry out research in collaboration with the stochastics research group, with a small amount of teaching duties included. The salary is competitive, based on experience and qualifications, and includes occupational health and a travel budget for international conferences and workshops. The position is for one year with a possible extension for another year, starting preferably in September 2014 and no later than January 2015.

Further information and instructions for applying:

http://www.aalto.fi/en/about/careers/jobs/view/185/

Please feel free to forward this message to colleagues and potential candidates. The application deadline is 13 June 2014.

# Quals! What are they good for? Absolutely…

… nothing? So sayeth some in the business. Bill Gasarch wrote a post up last week about the “point” of the qualifying exam in which he says the two points of the process are to get students out of the program who may not be able to finish, and to make sure students are “well rounded.” While this is nice from an administrative/pedagogical point of view, I think they first question to ask is “what can you measure with a qualifying process?”

At Berkeley we had an exam after your first year — the “prelim” in your subject area. In CS (theory, at least, not sure about other areas) this was a presentation of a paper (or so I recall). You had to know it cold and be able to provide context, answer questions etc. In EE it was a 1 hour oral exam on 3 topics — two undergrad and one grad-ish related to your general sub-area. I took mine in DSP so I had to know basic DSP, more advanced DSP (filter banks etc), and stochastic processes. We spent a good part of the summer studying and in the end a little more than half of us passed, and the others had to retake the exam or got a conditional pass (meaning they had to take or TA some course). In EE, the prelim process is designed to make you learn the basics at certain level — if you can answer these questions correctly under some performance pressure and seem comfortable/fluent, then you pass. It measures how “comfortable” you are with basic (undergraduate) material.

In the Rutgers ECE department we have a 4-part exam — three oral 1-hour exams on different subjects (I got to do one exam in Communications with my colleague David Daut) and a written math exam. Students have to meet an average score threshold to pass, otherwise they have to retake some (or all) of the exams. Many take the exam in their third year, and the result of this intensive assay is that many students basically don’t do much research for the semester before their exams. In general, the material is a little more advanced, I think, than the Berkeley prelim, and focuses a but more on concepts encountered in graduate school. The exam also measure how “comfortable” you are with basic graduate material, but in a much more drawn-out manner, and later in graduate school.

Neither exam particularly measures your ability to do research, and instead focuses on core competence. The first criterion of Gasarch’s is really about whether someone will be able to complete a PhD in a reasonable amount of time; this is, in fact, a very difficult thing to measure. One approach is to say that it’s all up to the advisor, but some oversight is necessary, and having the other faculty in the department also evaluate students outside the classroom is desirable from an academic community standpoint. The CS theory exam of presenting a paper actually seems better in this regard, but then its entirely about research. Furthermore, I think that for many students, reading and understanding a paper (not of their choosing) might be a tall order very early in their career. How can we really assess whether someone would be best served getting an MS vs. a PhD?

I do think core competence is important. Getting a PhD in Electrical Engineering should mean that you have basic knowledge about some topic within EE. The alternative is that you can do research on anything, as long as the committee signs off on it, and it counts as EE. I’m not a fan of boundary drawing, but there is a value to getting students to integrate some of their undergraduate knowledge across classes (as opposed to taking the final and forgetting it), because this process of integration is also an important research skill. But the shifting nature of research areas means the cluster of topics most relevant for a solid foundation may not slot neatly into DSP/Comm/Solid State, etc. Is the problem only our disciplinary boundaries?

How does the qualifying process work in your department? Is it good, bad, or ugly?

# Documents needed for a Chinese Tourist Visa

I just got a tourist visa (L) to go to ICML from the Chinese Consulate in New York, and it was significantly less difficult than I had been led to believe. However, the information on the website is a little hard to parse (since there are so many visa classes), so here’s a quick list of the documents you need:

• Passport with at least six months of remaining validity and blank page for the visa.
• A photocopy of passport photo and information page.
• Copy of the application form with 1 passport photo attached.
• Printout of your plane ticket confirmation and hotel booking confirmation \textbf{OR} a formal invitation letter. The hotel need not be for the entire duration of your trip.

You will need to make two trips, the first to drop off the forms and your passport, and the second (4 days later) to pick up your passport and pay (currently $140). A tip that holds for New York (and probably elsewhere): don’t go right at opening time, since there is a huge rush. The line tends to thin between 10 and 11, and your wait will be significantly shorter. Depending on how annoyed the guards are, they may yell at you for using your phone (technically not allowed), so bring a magazine or some knitting or something in case the wait is long. When you pay, it’s MasterCard/Visa only, or cashiers check/money order. # CFP: JSTSP Special Issue on Privacy IEEE Signal Processing Society IEEE Journal of Selected Topics in Signal Processing Special Issue on Signal and Information Processing for Privacy Aims and Scope: There has been a remarkable increase in the usage of communications and information technology over the past decade. Currently, in the backend and in the cloud, reside electronic repositories that contain an enormous amount of information and data associated with the world around us. These repositories include databases for data-mining, census, social networking, medical records, etc. It is easy to forecast that our society will become increasingly reliant on applications built upon these data repositories. Unfortunately, the rate of technological advancement associated with building applications that produce and use such data has significantly outpaced the development of mechanisms that ensure the privacy of such data and the systems that process it. As a society we are currently witnessing many privacy-related concerns that have resulted from these technologies—there are now grave concerns about our communications being wiretapped, about our SSL/TLS connections being compromised, about our personal data being shared with entities we have no relationship with, etc. The problems of information exchange, interaction, and access lend themselves to fundamental information processing abstractions and theoretical analysis. The tools of rate-distortion theory, distributed compression algorithms, distributed storage codes, machine learning for feature identification and suppression, and compressive sensing and sampling theory are fundamental and can be applied to precisely formulate and quantify the tradeoff between utility and privacy in a variety of domains. Thus, while rate-distortion theory and information-theoretic privacy can provide fundamental bounds on privacy leakage of distributed data systems, the information and signal processing techniques of compressive sensing, machine learning, and graphical models are the key ingredients necessary to achieve these performance limits in a variety of applications involving streaming data, distributed data storage (cloud), and interactive data applications across a number of platforms. This special issue seeks to provide a venue for ongoing research in information and signal processing for applications where privacy concerns are paramount. Topics of Interest include (but are not limited to): • Signal processing for information-theoretic privacy • Signal processing techniques for access control with privacy guarantees in distributed storage systems • Distributed inference and estimation with privacy guarantees • Location privacy and obfuscation of mobile device positioning • Interplay of privacy and other information processing tasks • Formalized models for adversaries and threats in applications where consumer and producer privacy is a major concern • Techniques to achieve covert or stealthy communication in support of private communications • Competitive privacy and game theoretic formulations of privacy and obfuscation Important Dates: Manuscript submission due: October 1, 2014 First review completed: December 15, 2014 Revised manuscript due: February 1, 2015 Second review completed: March 15, 2015 Final manuscript due: May 1, 2015 Publication date: October 2015 Prospective authors should visit the JSTSP homepage for information on paper submission. Manuscripts should be submitted using Manuscript Central. # yet more not-so-recent hits from ArXiV Some shorter takes on these papers, some of which I should read in more detail later. I figure I’ll use the blog for some quick notes and to see if any readers have any comments/ideas about these: Differentially Private Convex Optimization with Piecewise Affine Objectives (Shuo Han, Ufuk Topcu, George J. Pappas) — arXiv:1403.6135 [math.OC]. The idea here is to look at minimizing functions of the form $f(x) = \max_{i = 1,2, \ldots, m} \{ a_i^{\top} x + b_i \}$ subject to $x$ belonging to some convex polytope $\mathcal{P}$. This is a bit different than the kind of convex programs I’ve been looking at (which are more ERM-like). Such programs occur often in resource allocation problems. Here the private information of users are the offsets $b_i$. They propose a number of methods for generating differentially private approximations to this problem. Analyzing the sensitivity of this optimization is tricky, so they use an upper bound based on the diameter of the feasible set$\mathcal{P}\$ to find an appropriate noise variance. The exponential mechanism also gives a feasible mechanism, although the exact dependence of the suboptimality gap on $\epsilon$ is unclear. They also propose a noisy subgradient method where, instead of using SGD, they alter the sampling distribution using the exponential mechanism to choose a gradient step. Some preliminary experiments are also given (although none exploring the dependence on $\epsilon$, which would also be very interesting)!

Assisted Common Information with an Application to Secure Two-Party Sampling (Vinod M. Prabhakaran, Manoj M. Prabhakaran) — arXiv:1206.1282 [cs.IT]. This is the final version of the journal version of a few conference papers that Vinod and Manoj have done on an interesting variant of the Gács-Körner problem. The motivation is from secure multiparty computation — the problem also touches on some work Vinod and I started but is sadly languishing due to the utter overwhelmingness of starting a new job. Hopefully I can get back to it this summer.

Analysis of Distributed Stochastic Dual Coordinate Ascent (Tianbao Yang, Shenghuo Zhu, Rong Jin, Yuanqing Lin) — arXiv:1312.1031 [cs.DC]. The title pretty much sums it up. I’m interested in looking a bit more at the analysis method, since I had a similar algorithm bouncing around my head that I would like to analyze. The main idea is also update the primal variables to achieve a speedup/use a larger step size.

Convergence of Stochastic Proximal Gradient Algorithm (Lorenzo Rosasco, Silvia Villa, Bang Công Vũ) — arXiv:1403.5074 [math.OC]. This is a similar setup as my last post, with a convex objective that has a smooth and non-smooth component. They show convergence in expectation and almost surely. The key here is that they show convergence in an infinite-dimentionsal Hilbert space instead of, say, $\mathbb{R}^d$.

/

# Gaussian channel capacity and angles

There’s this nice calculation in a paper of Shannon’s on optimal codes for Gaussian channels which essentially provides a “back of the envelope” way to understand how noise that is correlated with the signal can affect the capacity. I used this as a geometric intuition in my information theory class this semester, but when I described it to other folks I know in the field, they said they hadn’t really thought of capacity in that way. Perhaps it’s all of the AVC-ing I did in grad school.

Suppose I want to communicate over an AWGN channel

$Y = X + Z$

where $Z \sim \mathcal{N}(0,N)$ and $X$ satisfies a power constraint $P$. The lazy calculation goes like this. For any particular message $m$, the codeword $X^n(m)$ is going to be i.i.d. $\mathcal{N}(0,P)$, so with high probability it has length $\sqrt{n P}$. The noise is independent and $\mathbb{E}[ X Z ] = 0$ so $\langle X^n, Z^n \rangle \approx 0$ with high probability, so $Z^n$ is more-or-less orthogonal to $X^n(m)$ with high probability and it has length $\sqrt{nN}$ with high probability. So we have the following right triangle:

Geometric picture of the AWGN channel

Looking at the figure, we can calculate $\sin \theta$ using basic trigonometry:

$\sin \theta = \sqrt{ \frac{N}{N + P} }$,

so

$\log \frac{1}{\sin \theta} = \frac{1}{2} \log \left( 1 + \frac{P}{N} \right)$,

which is the AWGN channel capacity.

We can do the same thing for rate-distortion (I learned this from Mukul Agarwal and Anant Sahai when they were working on their paper with Sanjoy Mitter). There we have Gaussian source $X^n$ with variance $\sigma^2$, distortion $D$ and a quantization vector $\hat{X}^n$. But now we have a different right triangle:

Geometry of the rate-distortion problem

Here the distortion is the “noise” but it’s dependent on the source $X^n$. The “test channel” view says that the quantization $\hat{X}^n$ is corrupted by independent (approximately orthogonal) noise to form the original source $X^n$. Again, basic trigonometry shows us

$\log \frac{1}{\sin \theta} = \frac{1}{2} \log \left( \frac{\sigma^2}{D} \right)$.

Turning back to channel coding, what if we have some intermediate picture, where the noise slightly negatively correlated with the signal, so $\mathbb{E}[ X Z ] = - \rho$? Then the cosine of the angle between $X$ and $Z$ in the picture is $\frac{\rho}{\sqrt{NP}}$ and we have a general triangle like this:

Geometry for channels with correlated noise

Where we’ve calculated the length of $Y^n$ using the law of cosines:

$\|Y^n\|^2 = n (P + N) - 2 n \sqrt{ N P } \cdot \frac{\rho}{\sqrt{NP}}$.

So now we just need to calculate $\theta$ again. The cosine is easy to find:

$\cos \theta = \frac{ P + N - 2 \rho + P - N }{ 2 \sqrt{ P (P + N - 2 \rho) } } = \frac{P - \rho}{ \sqrt{P (P + N - 2 \rho) } }$.

Then solving for the sine:

$\sin^2 \theta = 1 - \frac{ (P - \rho)^2 }{ P( P + N - 2 \rho) }$

and applying our formula, for $\rho < \sqrt{PN}$,

$\log \frac{1}{\sin \theta} = \frac{1}{2} \log\left( \frac{ P ( P + N - 2 \rho) }{ P (P + N - 2 \rho) - (P - \rho)^2 } \right) = \frac{1}{2} \log\left( \frac{ P (P + N - 2 \rho) }{ PN - \rho^2 } \right)$

If we plug in $\rho = 0$ we get back the AWGN capacity and if we plug in $\rho = N$ we get the rate distortion function, but this formula gives the capacity for a range of correlated noise channels.

I like this geometric interpretation because it's easy to work with and I get a lot of intuition out of it, but your mileage may vary.

# Mental health in graduate school

I recently posted a link to an article on mental health in graduate school on Facebook (via a grad school friend of mine), and it sparked a fair bit of discussion there. The article is worth reading, and I am sure will echo with many of the readers. The discussion veered towards particularities of graduate school pressures in STEM, and the contributing factors to mental stress that are driven by funding structures and the advisor/student relationship. The starting point comes from this part of the article:

In this advisor-advisee arrangement, the student trades her labor as a researcher for the advisor’s mentorship and, ultimately, the advisor’s approval of her degree before she can graduate. For students seeking an academic position after graduate school, an advisor’s letter of recommendation can be the difference between landing a job and being left out in the cold, a harsh reality given today’s sparse academic job market. All of these factors mean that the faculty advisors hold tremendous power in the advisor-advisee relationships. They are the gatekeepers of success in the graduate endeavor.

This notion of “trading labor for mentorship” is most directly monetized in grant-funded fields like engineering, where graduate students are “working in the lab” on a project that is (hopefully) related to their thesis topic. In some cases, this works out fine, but in others, the research for the grant-relevant project does not contribute directly to their thesis. For funding agencies which want “deliverables,” this pressure to produce results on schedule creates a tension. The advisor becomes a boss.

Some of the points raised in the discussion on Facebook seemed important to bring out to a wider audience. One suggestion is to disentangle NSF support for projects and research from grad student salaries. So students could apply for NSF support and then they take their funding with them to find an advisor. In STEM this would be difficult, given the large number of international students who would not be qualified for such support, but it does give some power to students to walk away from a bad situation and more incentives for PIs to be more mentors than bosses. I am not entirely convinced it would help in terms of mental health though — students need more and better mentoring, not just the means to walk away. Also, Roy pointed out, having the student and advisor both convinced that a problem is important and solvable creates a shared commitment that helps students feel less isolated. For postdocs, though, this model would be a significant improvement over the status quo. Right now, there is almost no consensus on what a postdoc should be, and I’ve seen postdoc jobs that range from factotum to co-PI.

When one is on the other side (post-PhD), it’s tempting to say that grad school would have been easier if I had been a bit more organized or had better time-management skills. Perhaps the difficulties one has can be solved with “one weird trick.” I think that’s terribly naïve. As advisors, we definitely can do things to help students learn to work better — that’s the transition from being a student to being a researcher. But the notion that depression comes about as a result of simply not being productive enough, or feeling behind, or any other “outcomes”-based reason, misses the environmental and social factors that are equally important.

Graduate research is often very isolating. Perhaps some STEM students actually enjoy this kind of solitary work, but generalizing is dangerous. Having a grad student social organization, weekly happy hour, softball league, or other “outlet” isn’t enough. I used my startup funds to help buy a table-tennis table for my department at Rutgers, and while the students seem pretty happy about it, it’s not actually creating a community. One important question to ask is how the faculty and the department can help create and support that kind of community so that it can go on its own, organically.

In a department like mine, the majority of graduate students are international, and have a host of other stressors about being in a new (and often much more expensive) country. Using mental health resources may not be normalized in their home country or culture. Regardless of where they are from however, the big challenge is this:

…awareness of the existing resources among the graduate student population remains frustratingly low, due in part to the insular nature of traditional academic departments. A broader culture of wellness may prove even more elusive in the face of a rigidly hierarchical academic culture that often rewards drive and sacrifice without encouraging balance. In this climate, graduate student mental health advocates—students, staff, and administrators—face an uphill struggle in the years to come. The consequences of this struggle tear at the very fabric of the academic experience and suggest fundamental misalignment of priorities.

It’s only a misalignment of priorities if we don’t interrogate our priorities. This isn’t two trains crashing into each other, but it does require a “structural” recognition that graduate students are a part of the family, as it were, and treating them as such.