Sunday, May 25, 2008


Carl Zimmer's Microcosm, p. 149:

The fact that E. coli and a man-made network show some striking similarities does not mean that E. coli was produced by intelligent design. It actually means that human design is a lot less intelligent than we like to think. Instead of some grand, forward-thinking vision, we create some of our greatest inventions through slow, myopic tinkering.

Inappropriate Mathematics for Machine Learning

Inappropriate Mathematics for Machine Learning: Reviewers and students are sometimes greatly concerned by the distinction between:

  1. An open set and a closed set.
  2. ASupremum and a Maximum.
  3. An event which happens with probability 1 and an event that always happens.

I don’t appreciate this distinction in machine learning & learning theory. All machine learning takes place (by definition) on a machine where every parameter has finite precision. Consequently, every set is closed, a maximal element always exists, and probability 1 events always happen.

The fundamental issue here is that substantial parts of mathematics don’t appear well-matched to computation in the physical world, because the mathematics has concerns which are unphysical. (Via Machine Learning (Theory).)

John is deeply confused here. One of the most important jobs of mathematical abstraction is to make it easier to reason about whole classes of problems whose size is data dependent. That's why we use Turing machines in the theory of computation, and the continuum when working with approximations and rates of change whose precision cannot be specified in advance. Constructive mathematics, however clever, is a niche pursuit because it makes simple arguments about rates, approximation, and existence much harder than they are with Weierstrass epsilon-delta arguments and their hugely successful development in analysis, topology, probability, and applied mathematics. It would be wonderful if there was a constructive mathematics that was as easy to work with as those classical areas. I spent quite a bit of time and effort studying intuitionistic and constructivist methods a long while back, and my frustrated conclusion then was that these approaches that are supposedly closer to computation actually make the simplest classical arguments a huge chore, for very uncertain payoff.

Sticking my neck out: constructivism is misguided because it believes in a single fabric for mathematics. It refuses to accept that mathematics is a patchwork of methods that work at different levels of abstraction and are not fully inter-translatable. Which is not surprising if you recognize that mathematics is a big messy workshop of tools for abstract thought, not the incomplete projection of a Platonic ideal.

Thursday, May 22, 2008

On owning books

On owning books:

My father will mail me a copy of Willie Morris' North Toward Home and a few Durrells; I will mail him the hilarious and poignant A Thousand Shall Fall, and an old edition of Grimble's We Chose the Islands. And we argue over whom first heard about Andy Adam's The Log of a Cowboy; then we realize we both have editions in our homes.

Here is the thing: it's hard to say who owns these books. They are ours, collectively; they fling back and forth between Texas and California, and either household is only a temporary resting place. These books are shared, because they are appreciated; loved, because they are enjoyed with others.


Whatever digital (ebook) books look like in the future, if they do not embody the right to share, in an unrestricted and platform independent manner, they will be poorer things.

This is called the first sale doctrine. It's part of why people love books -- a love built from sharing. It's what makes libraries possible. A world where content is licensed, and sold with restrictions on use, is a world less full of enthusiastic readers; less full of love.

(Via The Patry Copyright Blog.)

I have this theory that much of what we consider “higher intelligence” could not operate without external memory — books, notes, letters, drawings, {black|white}boards — much like a Turing machine needs its tape to go beyond finite state. When we exchange books, we build each other's minds.

Sunday, May 18, 2008

They must be having a great Spring skiing season in Colorado

PHL->SFO this morning, I slept in my left-hand-side, over-the-wing window seat, much of the route before the first rise of the Rockies. Looking down, I didn't see the usual landmarks, and at first I thought we were on a very northerly route. Then I saw three obvious ski areas, the last two close together, with the last one very big and vaguely familiar. Lots of snow still down to below tree line. Eventually I figured out that the last two neighboring ski areas were likely to be Aspen and Snowmass, with the distinctive jagged N-facing cirque below the summit at Snowmass. Later we crossed the Sierra SE->NW, with Mammoth and June Mountain clearly visible, and Tioga Pass briefly glanced ahead of the wing, Dana Couloir still white (and probably totally icy).

Language as shaped by the brain

In a recent Language Log comment, Shimon Edelman pointed to "great new paper by Christiansen and Chater" on the "the logical problem of language evolution". During a recent trip to Philadelphia and New York that had nothing to do with language evolution, I printed the paper, and then proceeded to stay up late into the night to finish reading it. It was not just jet lag. The paper is indeed very interesting. Since it is a review that cites many primary sources that I have not read, my support is cautious, but its evolutionary argumentation against the hypothesis of innate, genetically-enconded language-specific brain machinery appears very solid. I am also sympathetic to their hypothesis that "language has evolved to fit prior cognitive and communicative constraints, then it is plausible that historical processes of language change provide a model of language evolution; indeed, historical language change may be language evolution in microcosm." But the section in which they develop this hypothesis is less tightly argued than the rest of the paper, with some just-so story-telling drifting by. Their enthusiasm may have pushed them a bit further than the very fragmentary evidence warrants, but it's not surprising that they are enthusiastic after their convincing demolition of rationalist fantasies that have been holding us back for a long time.

Saturday, May 17, 2008

Reading "Microcosm"

I'm on page 73 of Microcosm, Carl Zimmer's account of E. coli's role in the development of modern biology. So far, it's the most crisply written and informative science book I've read in a long time, and I've been reading quite a few good ones. It does not condescend, it does not wave its hands when things get more technical. No superfluous "local color" digressions (I don't really need irrelevant anecdotes from some scientist's life). Yet, the writing is lively and almost humorous, as life (ours or E. coli's) is pretty funny in its contradictions, kludges, and amazingly successful hacks.

Friday, May 16, 2008

We have violated the prime directive

We have violated the prime directive: Noah Smith and I are co-supervising Tae Yano on a project involving analysis of political blogs, and Tae left a pile of results and code on her CMU web site as a way of communicating with Surprisingly someone at one of the blogs she spidered, Little Green Footballs, actually noticed, leading to a lot of investigative work in this fascinating thread (Via Cranial Darwinism.)

Best story of the week. A whole post-deconstructionist novel could be written around this. This thread entry takes the prize:

#474: magine a natural language program that could respond to comments with charm and style, sort of a robo-blogger. Now imagine an army of them, all set to monitor a different political blog, run by a campaign manager for a politician. Add to its writing ability an encyclopedic memory, with instant access to famous quotes, historical facts, trivia, statistics, and every word ever uttered by the opposition. You now have an army of ultimate bloggers, all completely under the control of one campaign manager... no more "going off message" by some underpaid/volunteer lackey, just high quality counter-opinion, ready to be inserted into the blogs of anyone who disagrees with your candidate. This research will eventually lead to robo-blogging to kill emerging scandals and alternative opinions on issues... no more Rathergates as they will be smothered in the cradle by the most charming bloggers around -- the poli-bots.

For all I know, this could be going on. Mark V. Chain, the Bell Labs robo-flamer, fooled many on Usenet a couple of decades ago. The Turing test gets much easier pass when the machine is craftily designed to exploit the cognitive and emotional biases of the judges.

Monday, May 12, 2008

Powerset Launches!

Powerset Launches!: Meanwhile, I'm waiting for Fernando to pounce.[...]Secondly (actually, this is more important) pundits are going to write about the wikipedia-only issue. They're not getting it. 90% of search results come from a tiny fraction of web pages due to the huge redundancy on the web and the differences between searcher needs and author/publisher intents. The task isn't to always search that huge set, but to get the answers to the user. (Via Data Mining.)

No pouncing. The demo is interesting, which is what I expected from the smart folks at Powerset. Congratulations to them. But I've seen nothing yet that contradicts my earlier observations in this blog or in the NYT.

As for the huge redundancy of the web, forgive my skepticism. If I search for "Matthew Hurst", I want your blog first, not some more popular page that happens to make some of your blog redundant. The tail of the distribution matters.