The Conyers bill and open access

Allie sent this blog post my way about the Conyers bill, about which Lawrence Lessig has been quite critical. At the moment the NIH requires all publications from research it funds to be posted (e.g. on PubMed) so that the public can read them. This makes sense because the taxpayers paid for this research.

What Conyers wants is to do is end the requirement for free and public dissemination of research. Why? Lessig says he’s in the pocket of the publishing industry. From the standpoint of the taxpayer and a researcher, it’s hard to see a justification for this amendment. Conyers gives a procedural reason for the change, namely that “this so-called ‘open access’ policy was not subject to open hearings, open debate or open amendment.” So essentially he wants to go back to the status quo ante and then have a debate, rather than have a debate about whether we want to go back to the status quo ante.

From my perspective, spending Congressional time to do the equivalent of a Wikipedia reversion is a waste — if we want to debate whether to change the open access rules, let’s debate that now rather than changing the rules twice. I think we should expand open access to include the NSF too. It’s a bit tricky though, since most of my work is published (and publishable) within the IEEE. The professional societies could be a great ally in the open-access movement, but as Phil Davis points out, the rhetoric on both sides tends to leave them out.

Elsevier strikes again

Via Crooked Timber comes another story about the depths plumbed by Elsevier:

Merck paid an undisclosed sum to Elsevier to produce several volumes of a publication that had the look of a peer-reviewed medical journal, but contained only reprinted or summarized articles–most of which presented data favorable to Merck products–that appeared to act solely as marketing tools with no disclosure of company sponsorship… Disclosure of Merck’s funding of the journal was not mentioned anywhere in the copies of issues obtained by The Scientist.

Elsevier has been involved in shady dealings before, but this is a new one for me. I recently turned down a request to review a paper for an Elsevier-published journal (citing their business practices), and this piece of news confirms my decision.

Infocom 2009 : delay issues

Effective Delay Control for Online Network Coding
Joao Barros (University of Porto, PT); Rui Costa (Universidade do Porto / Instituto de Telecomunicações, PT); Daniele Munaretto (DoCoMo Euro-Labs, DE); Joerg Widmer (DoCoMo Euro-Labs, DE)
This talk tries to merge ARQ with network coding. The key idea is that a packet is “seen” if it only depends on XORs with future packets. The delay analysis is based then on analyzing the chains of dependent packets induced by erasures. This paper looks at the problem of multicast. Here the issue is managing the delays to multiple receivers and assessing which receiver is the “leader” and how the leader switches over time. For random erasures (different for each receiver) this means we are doing a biased random walk whose location shows who the leader is. A coding strategy is trying to control this random walk, and a strategy is proposed to maintain a “leader” by sending the XOR of the last unseen packet of all users when the leader loses a packet.

The Capacity Allocation Paradox
Asaf Baron (Technion – Israel Institute of Technology, IL); Isaac Keslassy (Technion, IL); Ran Ginosar (Technion, IL)
This talk was about a simple example of a network in which adding capacity can make a stable network unstable. To me it seemed to be because of the particular model adopted for the network, namely that if a link of capacity C is available, then the transmitter will operate at rate C. The simple example of the paradox is a 2-user multiaccess link. Suppose we have arrival process with rate 1 arriving at two users which have outgoing links of capacity 1 to a shared queue. This queue has output capacity 2, so the whole network is stable. However, if one user gets a capacity 2 link to the queue, then their traffic can hog the output link and cause increasing delay to the second user. The paradox was motivated by networks on a chip, which seem to be an interesting source of new problems with different emphases than traditional networking problems.

Power-Aware Speed Scaling In Processor Sharing Systems
Adam Wierman (California Institute of Technology, US); Lachlan Andrew (Swinburne University of Technology, AU); Kevin Tang (Cornell University, US)
This talk was about assigning speeds to processing jobs in a queue — if you do a job faster it takes more power but reduces delay. There are different ways of assigning speeds, either a fixed speed for all jobs (static), a square-wave for speed vs. time (gated static), or some arbitrary curve (dynamic). The metric they choose to look at it a sort of regularized energy E[\mathrm{energy}] + \beta E[ \mathrm{delay} ], where \mathrm{energy} = (\mathrm{speed})^{\alpha}. For a large number of jobs they get a kind of limiting form for the optimal speed and show that a gated static policy performs within a factor of 2 of the optimal dynamic, which is verified by simulation. In general we may not know the arrival process and so choosing the duty cycle for a gated static policy may be hard a priori. In this case a dynamic strategy may be much better to handle model variability. This was one of those problems I had never thought about before and I thought the results were pretty cute.

from my mandatory online course

I am required to take an online course on “Sexual Harassment Prevention Training for Supervisors” here at UCSD, and it is full of case studies with ridiculous names like “Manny Mozart” (for the Music Department) and Pierre Rodin (for French). Here was one which seemed quite strange to me:

Several male faculty in the predominately male Department of Human Studies invite a new male faculty member to Hooters for lunch, explaining that this is a bonding event for the “guys” every Friday. Professor Fellowman attends at first, but is uncomfortable with the setting, behavior and discussion during these lunches, and refuses subsequent requests to attend.

The department chair tells Professor Fellowman that he will not do well in the department if he cannot develop relationships with his fellow faculty members. Professor Fellowman is subsequently assigned to teach the largest and most unpopular courses, and is shunned by his male peers. Eventually, he suffers an unfavorable departmental merit review.

Case Study: Does Professor Fellowman have a claim of sexual harassment?

I am having a hard time imagining a department at a UC for which the regular faculty outing would be to Hooters (although the world is full of surprises). According to Oncale v. Sundowner Offshore Services, Inc. (1998) this would constitute sexual harassment (which seems obvious). But would it have hurt to come up with a more likely example?

Infocom 2009 : network statistics problems

Graph Sampling Techniques for Studying Unstructured Overlays
Amir Hassan Rasti Ekbatani (University of Oregon, US); Mojtaba Torkjazi (University of Oregon, US); Reza Rejaie (University of Oregon, US); Nick Duffield (AT&T Labs – Research, US); Walter Willinger (AT&T Labs – Research, US); Daniel Stutzbach (Stutzbach Enterprises LLC, US)
The question in this paper was how to collect network statistics (such as node degree) in peer-to-peer networks. Since peers join and leave, selecting a random peer is difficult because there are biases caused by high-degree peers and low-lifetime peers. One approach to correct for this is to use Markov chain Monte Carlo (MCMC) to sample from the nodes. This talk proposed respondent-driven sampling (RDS), a technique used to map social networks. The idea is to do a random walk on the network and gather statistics along the walk (not just at the end, as in MCMC). These measurements are biased, but then we can correct for the bias at the end (using something called the Hansen-Hurwitz estimator, which I don’t know much about). Experimental works showed a big improvement when vanilla MCMC and RDS were compared on hierarchical scale-free graphs.

A Framework for Efficient Class-based Sampling
Mohit Saxena (Purdue University, US); Ramana Rao Kompella (Purdue University, US)
This talk was about increasing the coverage of monitoring flows in networks. Currently routers sample flows uniformly, which catches the big “elephant” flows but not the low-volume “mouse” flows. In order to increase coverage, the authors propose dividing flows into classes and then sample the classes. So the architecture is to classify a packet first, then choose whether to sample it. The goal is to do a combination of sketching and hashing that shifts the complexity to slower memory, thereby saving costs. They use something called a composite Bloom filter to do this.

Estimating Hop Distance Between Arbitrary Host Pairs
Brian Eriksson (University of Wisconsin – Madison, US); Paul Barford (University of Wisconsin – Madison, US); Rob Nowak (University of Wisconsin, Madison, US)
Given a big network (like the internet) with N end hosts and M monitors, how can we estimate the hop distance between two end hosts from measurements between end hosts and monitors? This paper proposes using passive measurements, so that there may be many holes in the (N+M) \times (N+M) matrix of pairwise measurements. Their approach is to use a weighted version of multidimensional scaling (MDS) to deal with the holes. This approach can also be extended to deal with side information about the network topology (for example autonomous system IDs).

Infocom 2009 : general thoughts and assorted topics

This was my first time going to Infocom, and I had a good time, despite being a bit out of my element. The crowd is a mix of electrical engineers and computer scientists, of theorists, experimentalists, and industry researchers. Although I often would go to a talk thinking I knew what it would be about, I often found that it was rather different than what I expected. At first this was a little alienating, but in the end I found it refreshing to be introduced to a host of new ideas and problems. Unfortunately, the lack of familiarity may translate into some misunderstandings in these posts; as usual, clarifying comments are welcome!

Below are the papers I couldn’t categorize into other areas.

Dynamic Power Allocation Under Arbitrary Varying Channels – An Online Approach
Niv Buchbinder (Technion University, IL); Liane Lewin-Eytan (Technion, IL); Ishai Menache (Massachusetts Institute of Technology, US); Joseph (Seffi) Naor (Technion, IL); Ariel Orda (Technion, IL)
The model is that of an online power allocation problem wherein the gain of a fading channel h_t varies over time in an unpredictable manner. At each time we can choose a power P_t for the transmitter based on the past values h_1, h_2, \ldots, h_{t-1} and we will get a reward U(h_t, P_t) = \log(1 + h_t P_t). Subject to a total power constraint, what is the optimal allocation policy, in terms of competitive optimality vs. an algorithm that knew h_t in advance? They show lower bounds for any algorithm of O( \log H_{\max}/H_{\min}) for the competitive ratio and an achievable strategy by quantizing the possible fading values and executing an online allocations strategy for the discrete problem. Of course I went to this talk because it claimed to be about arbitrarily varying channels, but what was a little unconvincing to me was the abstraction — you have to choose not only a transmit power, but also a coding rate, so it’s unclear that in reality you will get \log(1 + h_t P_t) utility each time. There may be a way to justify this kind of modeling using some rateless code constructions, which seems like an interesting problem to think about. UPDATE: I looked at the paper and saw that they are considering a block fading channel so you get that utility for a block of channel uses, and that the channel gain for the given slot is revealed to the transmitter, so you don’t need to be fancy with the underlying communication scheme.

Lightweight Coloring and Desynchronization for Networks
Arik Motskin (Stanford University, US); Tim Roughgarden (Stanford University, US); Primoz Skraba (Stanford University, US); Leonidas Guibas (Stanford University, US)
This paper was on duty-cyling nodes in a sensor network to preserve power and avoid interference. In particular, we would like to color the n nodes of the network so that no two neighbors are awake at the same time (thereby improving coverage efficiency and lowering interference to a centralized receiver). This can be mapped to problem of allocating intervals on a circle of circumference T (the cycle time) such that two neighbors in the graph do not get overlapping intervals. Nodes with low degree should get longer intervals. The constraint in this assignment problem is that the solution must be decentralized and require minimal internal state and external inputs. They show an algorithm which uses one bit of local state to use O(d_{\max}) colors in O(\log n) time, and an algorithm with one bit of local state and one bit of external side information to color the graph with d_{\max} + 1 colors in d_{\max} \log n rounds.

Using Three States for Binary Consensus on Complete Graphs
Etienne Perron (EPFL, Switzerland); Dinkar Vasudevan (EPFL, Switzerland); Milan Vojnovic (Microsoft Research, UK)
This paper looked at a binary consensus problem that was seemed on the face of it quite different from the consensus algorithms that I work on. They consider a complete graph where every node has a an initial value that is either 0 or 1. The goal is to compute the majority vote in all the nodes in a distributed manner while requiring only a small state in each of the nodes. This is related to a standard voter model, in which nodes sample their neighbor and switch their vote if it is different. If we have 3 states, (0, e, 1), where e is “undecided” then we can make a rule where a node samples its neighbor and moves one step in direction reported by the neighbor. Different variants of the algorithm are also analyzed, and the error rate analysis seemed a bit hairy, involving ODEs and so on. What I was wondering at the end is the relationship of this algorithm to standard gossip with one bit quantization of messages.

Greetings from Infocom 2009

Greeting to you, patient (and probably bored) readers! I’m in Rio de Janeiro for Infocom 2009 and yes, I am going to be blogging a bit about it. The conference is little out of my bailiwick, at least so far. I go to sessions and think I might know what the talk is about, based on the title, only to find out I’m completely wrong. It’s good though, and as a challenge to myself I will try to blog about some of the talks a bit (I guess I gave up on finishing my ITA reminiscences).

So keep tuned!

Ways to keep track of ideas

For as long as I’ve been doing research, I’ve tried to come up with a “system” that works for keeping track of the loose ideas that filter through during talks or in idle moments. Earlier on, the focus was on recording technologies — scraps of paper, an audio tape recorder, the Palm Pilot and its inscrutable Graffiti. The Moleskine reared its sightless head, promising an end to the torment of lost epiphanies. The sad truth of course is that progress doesn’t come in lightning flashes of brilliance while strolling down a boulevard, but instead in obsessive fits and starts. So instead I’m trying to settle on a simpler method of keeping track of research questions — a text file.

Once the medium is fixed (and mutable), the central question is taxonomy. For a while I was using subject matter — Shannon theory, networking, etc. More recently I decided that it’s better to classify things by “how much I have thought about them.” I have 5 categories :

  1. Pie-in-the-sky : wow, that seems interesting… I should think about it for more than 5 minutes…
  2. Nebulous hand-waving : there’s a problem there, but what is the right framework?
  3. Percolating : ok, but what is the actual formal problem?
  4. In progress : finding the proof in the pudding.
  5. Writing : oh no, a deadline! Gotta figure out how to fit the page limit!

This seems to be working ok for me at the moment, but it’s chasing a moving target, it seems. Do any of you readers have systems that have worked out for you? Yes, this is a blatant plea to those lurkers out there…

supply and demand for US PhDs

I saw article in Inside Higher Ed on a new paper on the “Internationalization of U.S. Doctorate Education”. From the abstract:

Students from outside the U.S. accounted for 51% of PhD recipients in science and engineering fields in 2003, up from 27% in 1973. In the physical sciences, engineering and economics the representation of foreign students among PhD recipients is yet more striking; among doctorate recipients in 2003, those from outside the U.S. accounted for 50% of degrees in the physical sciences, 67% in engineering and 68% in economics.

Anyone in an engineering program (graduate or undergraduate) knows that there seem to be more foreign grad students than domestic students, but I didn’t really know the numbers until now. The paper claims that these increases can be explained by factors such as the expansion of undergraduate programs in the rest of the world and how

… the size of the college-aged population in the U.S. peaked in the mid-1970s and declined through the early ’90s. So while the fraction obtaining undergraduate degrees in science and engineering rose by about 2 percent a year in the 1980s and early ’90s, according to the paper, the raw number of science and engineering B.A.’s barely budged.

It’s a bit of a long paper (I haven’t had the time myself, but I might on the way back from Hong Kong), but seems important to read for those of us interested in academic engineering research programs.