I ported the mergetex script from perl to python and posted it on github, for those who are interested. It scans recursively though a large LaTeX project and generates a single .tex file suitable for uploading to ArXiV (with comments stripped out). I probably should have have branched from Manu’s though, oops. Ah well. There are lots of versions of this thing out there, so no claims to elegance here.
A related question (which I gave up on after a short attack) was how to generate a regular expression that matches the first part of a line of LaTeX before a comment. The trick is that \%
is escaped, so you want
There is a 95\% chance that I am bad at regexps % DUH\n
to return
There is a 95\% chance that I am bad at regexps %
and
There is a 95\% chance that I am bad at regexps % DUH more like 99\%\n
to return
There is a 95\% chance that I am bad at regexps %
but
There is a 95\% chance that I am bad %at regexps % DUH more like 99\%\n
to return
There is a 95\% chance that I am bad%
Some sort of negative lookbehind is needed but I got confused and decided to do something less efficient.