Saturday, February 28, 2009

James Boyle on keeping public science open to the public

James Boyle on keeping public science open to the public: The Financial Times yesterday ran a terrific op-ed by James Boyle, explaining the ridiculousness of the Conyers bill “that would eviscerate public access to taxpayer funded research.” (Via Joho the Blog.)

With a bit of (likely unintended) irony, that worthy op-ed is behind a (free) sign-up firewall. The hypothetical dialog between member of Congress and staffer is too tame, the reality is likely to be much more crude (access, campaign donations). Still, it goes beyond the typical pro-open access statement in explaining how open access can enable much more powerful linking among scientific results:

Even if this bill dies the death it so richly deserves, the very fact we are arguing about it indicates how far we have to go in our debates over science policy. Think about the Internet. You know it is full of idiocy, mistake, vituperation and lies. Yet search engines routinely extract useful information for you out of this chaos. How do they do it? In part, by relying on the network of links that users generate. If 50 copyright professors link to a particular copyright site, then it is probably pretty reliable.

Where are those links for the scientific literature? Citations are one kind of link; the hyperlink is simply a footnote that actually takes you to the desired reference. But where is the dense web of links generated by working scientists in many disciplines, using semantic web technology and simple cross reference electronically to tie together literature, datasets and experimental results into a real World Wide Web for science? The answer is, we cannot create such a web until scientific articles come out from behind the publishers’ firewalls. What might happen if we could build it? We do not know. Think of the speed of innovation that the open Web has unleashed. Then imagine that transformative efficiency applied to science and technology rather than selling books or flirting on social networks. This bill would forbid us from building the World Wide Web for science, even for the research that taxpayers have funded. And that is truly a tragedy.

I'd not bet as much on semantic web technology, but otherwise this is just right.

Sunday, February 22, 2009

Effective Research Funding

Effective Research Funding: [...] it’s odd that the rules for NSF funding, which is the premier source of funding for basic science in the US, generally requires university participation on proposals. This restriction naturally makes it easier for researchers at universities to acquire grant money than researchers not at universities. I don’t understand why this restriction is desirable from the viewpoint of a government wanting to effectively subsidize research./cite> (Via Machine Learning (Theory).)

Hal comments that one of the NSF's criteria for funding is educational impact, which means funding of education institutions. I think that's a red herring, because good industrial research labs do a lot for education through their internship programs. Just to mention a few cases I know personally, Lillian Lee (Cornell), Andrew Ng (Stanford), Ben Taskar (Penn), Sanjoy Dasgupta (UCSD), David Yarwosky and Jason Eisner (JHU) are former Bell Labs/AT&T Labs interns.

My main objection to John's suggestion is that I saw it go badly wrong in European Union research funding, and I can't think of any good mechanism to fix those wrongs. The problems ranged from cost accounting manipulations to distortions of research agendas to fit government research objectives. Industry is much more complex and subject to more varied pressures than academia, making grant rules both more burdensome and also easier to game (these are not contradictory). Furthermore, Congress keeps authorizing R&D tax credits that are a more direct and less distorting form of support for industrial research.

Scaling up intellectual authority

Irrational Exuberance 1.0: [...] At least the Times is using the right word these days -- open -- but not in the way that matters. They're willing to give away what we, in tech, have been giving away for a decade. Obviously that's not a disrupter. They need to give away what they have -- authority. The trick is to find a way to give it away without destroying it. If they can do it, then we will have cracked the nut, scale, massively more news, deeper coverage, and with it -- shifted economics. (Via Scripting News.)

What Dave Winer says here about the news applies as well to scientific publishing. The arguments about open access and about review quality are but a sideline to a much more fundamental one: how to create sustainable mechanisms that will increasingly open up the process of writing up new ideas, reviewing them, and publicly building a consensus for or against their scientific soundness and importance.

The major scientific publishers control jealously their authority-conferring machinery, which is what yields citations and thus rankings in the Science Citation Index, which has a disproportionate impact on academic funding and promotion decisions. In the meanwhile, they fight a FUD and lobbying war against open access efforts. But they have a point. The current mechanism, however flawed, was for a while self-sustaining and in fact able to generate substantial profits for its managers from commercial publishers and professional societies to editors. We do not have yet proven open and scalable models that likely to survive indefinitely without special intervention, not for scientific publishing and not for news. But there are some ideas.

Thursday, February 19, 2009

Code, the internet, and other biological systems

David Isenberg returns to the debate about "fixing" the net. I've not been following this debate closely (although I blogged my initial reaction), but I feel that much of the discussion misses the fact that large assemblages of code, and so the net, that are supposed to run indefinitely, are in some ways more like biological systems than like the simple, exactly describable engineered systems of the past. Code gets added, patched, disabled, copied and modified (cf. gene duplication), becomes dead because nothing calls it any longer (cf. pseudo-genes). Other code (viruses) latches onto functionality (receptors) to do its selfish deeds. An army of engineers (immune system) constantly scans for invaders and crashes, and makes patches. The big difference is that biocode does not have engineers to patch it — mutations and selection do the work over time. Still, every large long-running software system I have known resists attack and improves through incremental replacement of parts, with lots of trial and error, not by wholesale redesign.

The internet "fixers" goal is no more realistic than anyone's goal to avoid disease by redesigning their genome and rebooting their body.

Wednesday, February 18, 2009

Decision by Vetocracy

Decision by Vetocracy: Few would mistake the process of academic paper review for a fair process, but sometimes the unfairness seems particularly striking. (Via Machine Learning (Theory).)

I've also seen quite a few instances of bad, even unprofessional, reviewing in the last few years. However, I don't need to hypothesize a mechanism involving an obsessive vetoing minority to explain the deterioration of conference reviewing. Instead of individually malicious agents, the trends can be explained more globally by system overload. The demand for reviewing services is growing faster than the supply of mature, experienced reviewers. That's simply an effect of the field's demographic pyramid: increasingly large cohorts of relatively inexperienced researchers submitting papers to be reviewed by the much smaller cohorts of their seniors. As a consequence, either an increasing number of reviews are assigned to relatively unqualified reviewers or the senior cohort is overwhelmed and produces lower quality reviews. Network effects add to this: relatively senior PC and area chairs know best and are more likely to are reviewing favors from those in their and neighboring cohorts, increasing the chance that successive versions of the same paper will be assigned to the same reviewer drawn from that relatively small pool, bidding system or not. The problem is exacerbated by the very high peak reviewing load demanded by having a few large conferences where all the reviews have to be done in a month or so. Basically, we have a very congested network causing a lot of retries and lost pa{pers|ckets}.

The standard solution for this problem is that subfields split off and start their own meetings and journals. The new subfields, because they seem risky, attract fewer newcomers to sart with so reviewing quality tends to be higher. Also, subfield founders have a strong sense of ownership and responsibility towards their babies (sometimes too much), so they will work really hard on reviewing and other field-building activities. I saw this pattern when statistical natural language processing started its own series of meetings (such as EMNLP), and also earlier with logic programming and with with learning theory.

An interesting question is whether there are ways to scale up an area of research that do not require fission. For instance, if we were to move to open online paper-and-response systems, as Leon Bottou, Yann LeCun and others have suggested, maybe network effects would work for us in bringing the most discussed ideas to the top rather than against us in creating terminal reviewing congestion. Discussants would choose which papers to review, but because they would be not anonymous, torpedoing a paper would collide with social and professional norms. The worst a malicious agent wanting to stay anonymous could do is to arrange for sock puppets to diss a paper, but if the paper is good, many others would jump to its defense, and a mild level of moderation, editorial or distributed, would be likely to be sufficient to dampen flame wars.

At the very least, a first turn-around electronic journal for short communications, with mechanisms for supporting material and for commentary, might do better than a conference because unlike a conference, a journal has institutional memory in its persistent editorial board. Such a journal could then organize a highlights (main session) and discussion (posters and workshops) conference based on the previous year's accepted papers. I understand that the VLDB community is considering seriously such a model.

Update: John Langford on his blog notes that my argument above requires superlinear growth. In fact, it just needs a relatively short period of superlinear growth (inflation) such that experienced reviewers are those who came into the field before the end of inflation. Eventually, as growth rates flatten out, the ratio of submissions to experienced reviewers will stabilize, or even decrease if the field loses vitality. I've seen subfields in all stages of this trajectory: initial slow growth, inflation if the field takes off, eventual maturity with stable growth, slow down. This trajectory is also supported by Gordon Tullock's cranky but very insightful analysis The Organization of Inquiry. I don't have the numbers at hand, unfortunately, but anecdotally I believe that NIPS had an inflationary period in the 90s, but now growth has flattened.

Monday, February 16, 2009

Really good books

I was updating the Recently Read sidebar of this blog, and as I realized that I had forgotten to include some titles, I thought it would be fun to try to select my top 5 of those 23 worthwhile books. I couldn't quite make my mind, so here's a list with some disjunctions:

  1. After Dark
  2. Microcosm
  3. Euler's Gem or The Best of All Possible Worlds
  4. Nowhere Man or Sea of Poppies
  5. Ice, Mud and Blood or Traffic

Hors concours, the most fun in the sense of unputdownable were The Wind-Up Bird Chronicle and Two Planks and a Passion (I fear I bored my Canada ski companions to tears with ski trivia from the last one).

After Dark is the most self-consistent and concentrated piece of fiction I've read in a long time; an algorithmic gem. Microcosm is evolutionary microbiology written in a way that a computer scientist can love. Euler's Gem reminded me of theorems and proofs I had forgotten, and showed me many others that I should have known; all elementary, all great fun to work through. The Best of All Possible Worlds reminded me and taught me of the many pitfalls of optimality, which we computer scientists keep getting confused about. Nowhere Man opened up for me the farce and horrors in the collapse of those other European dictatorships. Sea of Poppies is great linguistic miscegenation, and whacks one on the side of the head with some rather forgotten truths of colonialism. Ice, Mud and Blood shows us how to measure the swings of climate over geological time, and scares us about what might come. And Traffic is a needed lesson for all of us above average drivers; switching lanes in a traffic jam may really not help.

Fixing the Internet might break it worse than it's broken now.

Fixing the Internet might break it worse than it's broken now.: [...] Even Lawrence Lessig, a champion of the original Internet's original strengths, blogged "Zittrain told us so" the other day about a worm that appeared in early January [story]. However, Lessig isn't completely correct; this most recent worm did not bring the Internet down or markedly increase the amplitude of the normal background hysterical reaction against the Internet. I haven't even noticed its effect on my Internet use. Have you? Has this worm planted software that will make the Internet stop failing to fail? Dunno, but since I started using the Internet, many malicious malware manifestations have come and gone and the Internet keeps keeping on.

For sure, there's a very active community that hustles to get, and stay, on top of such attacks. (Hats off to them!!!) So far this community is succeeding in spades, and the same old Internet we know and love (and hate, and marvel at, and swear at, and nevertheless use every hour of every day) keeps on keeping on keeping on keeping on, giving the pink generator bunny a run for its money.

Read the whole post, it's well worth it. The arguments for "fixing" the internet are to me a mild form of arguments I would rather not have to be reminded of, given by Salazar's authoritarian regime I grew up under in Portugal, of how prior censorship, police control of the opposition, manipulated elections, and government-run unions were what kept us from the crime, pornography, and corruption of "decadent" countries like France (Gitane-smoking leftists), England (youth-corrupting Beatles and miniskirts), the Netherlands (Amsterdam!), or Sweden (especially Sweden, with its neutrality, embrace of refugees from dictatorships, and openness about sex). The first time I traveled abroad, to London, the vitality and variety of culture and the streets seemed chaotic and almost scary, but I learned more in a week than in so many years of state-regimented education.

Like David, for me, and for my children, the benefits of the open, unregulated internet have been enormously greater than the dangers. Sure, I'd rather not have to tweak my spam filters from time to time. But then, I'd rather not have to worry about street dangers in Philadelphia or San Francisco — in Salazar's time, the only street danger I had to worry about in Lisbon were the overbearing cops — but I know that's the price I have to pay for a relatively open urban environment. In any case, the risks I take voluntarily, whether by driving on the highway or by skiing in the backcountry, are way more real than the nightmare scenarios that those who have lost their nerve about the implications of free speech can concoct.

We have it too easy. Our first-world, good income lives are so sheltered that any conceivable danger generates hysterical over-reaction. Our social and mental defense systems, like our immune systems, are so deprived of real challenges that they hallucinate ghosts like the worm that brings down the internet, the chemical that poisons our children through vaccination, or the predatory homeless person. We need a bit more exposure, a little more dirt in our lives to reset the balance.

Sunday, February 15, 2009

The Conyers bill is back

The Conyers bill is back: Yesterday Rep. John Conyers (D-MI) re-introduced the Fair Copyright in Research Works Act.  This year it's H.R. 801 (last year it was H.R. 6845), and co-sponsored by Steve Cohen (D-TN), Trent Franks (R-AZ), Darrell Issa (R-CA), and Robert Wexler (D-FL).  The language has not changed.  [...] The Fair Copyright Act is to fair copyright what the Patriot Act was to patriotism.  It would repeal the OA policy at the NIH and prevent similar OA policies at any federal agency.  (Via Open Access News.)

Not again! How is Conyers different here from Santorum, who wanted to close open access to weather data to protect commercial weather data interests? We pay for this knowledge to be created with our taxes. We should not pay again some private party to get access to it. A private party that has most of its editorial work done by academics whose salaries are paid by tuition and by (directly or indirectly) government research grants. If bailout is a bad word, we have been bailing out scientific and technical publishers for decades now. That's why I refuse to review submissions to any closed access journal, and I write that to its editor when I am asked.

Italy make avalanche safety gear mandatory

Italy make avalanche safety gear mandatory: Italy has decided to make avalanche safety gear (avalanche beacon, shovel and probe) mandatory for all winter sports enthusiasts heading out of marked and secured ski runs. The law will also apply to off piste skiers. (Via PisteHors.)

This is pure safety theater (by analogy with the TSA's security theater). Carrying avy gear without practicing is pretty useless. I just did a few search drills with multiple deep targets in the beacon basin at Sol Mountain. I was humbled to miss in my first fine search by 50 cm or so for a target 180 cm deep because I hurried too much, even though I have done quite a few similar drills before where I nailed the target, and I did fine on later drills.

Tuesday, February 10, 2009

Sol Mountain Touring Feb 1-8 2009

Sol 2009
My best visit to Sol Mountain Touring so far (even with the back pain that emerged halfway through), and the previous two had already been very good. Guides Aaron and Quinn got us to great terrain every day. Powder runs on the first and last days were arguably of the very best I've ever had.

You can't make this stuff up

Impressive: It's a real feather in the cap of Politico that they published this news article which claims that the newest scientific data says we're actually experiencing global cooling. (Via Talking Points Memo.)

Nature, Science, PNAS, step aside:
D’Aleo reported in the 2009 Old Farmer’s Almanac that the U.S. annual mean temperature has fluctuated for decades and has only risen 0.21 degrees since 1930 — which he says is caused by fluctuating solar activity levels and ocean temperatures, not carbon emissions.

Monday, February 9, 2009

Math: the book

Math: the book: [...]While visiting IAS to give a talk, I noticed on several of my friends’ desks heavily-bookmarked copies of the Princeton Companion to Mathematics: a 1000-page volume that’s sort of an encyclopedia of math, history of math, biographical dictionary of math, beginners’ guide to math, experts’ desk reference of math, philosophical treatise on math, cultural account of math, and defense of math rolled into one, written by about 130 topic specialists and edited by the Fields medalist, blogger, and master expositor Timothy Gowers. (Via Shtetl-Optimized.)

Yum, yum.