Tuesday, December 29, 2009

Robin Kelley’s Transcendental Thelonious Monk

Robin Kelley’s Transcendental Thelonious Monk: Robin Kelley’s superb biography brings the Thelonious Monk story back from the ragged edge to the creative center of American music. And it brings my reading year to a blessedly loving, gorgeously swinging, dissonant, modernist, and utterly one-off climactic note. There may be another jazz biography as thickly detailed, as audibly lyrical, as passionate, as thrilling as this one, but I can’t bring it to mind.

Click to listen to Chris’s conversation with Robin Kelley (51 minutes, 24 meg mp3)

I got Kelley's book in Philly, but it was too thick to carry with my in-progress books, so it's on its way to California in an UPS package with my Christmas presents and some other stuff. The podcast is exceptional, great discussion backed up with some of Monk's most mind-blowing compositions. Highly recommended.

Monday, December 28, 2009

The Checklist Manifesto

I'm half-way through The Checklist Manifesto, by Atul Gawande, a surgeon and possibly the best New Yorker writer currently. When I took my wilderness first aid class a few weeks ago (highly recommended for all of you outdoors types), I was a bit skeptical of all the checklist mnemonics, but Gawande is making me look at checklists with a new respect. Now I just need to figure out how to memorize those mnemonics (mnemonics for mnemonics?). Or find a easy to carry checklist for my first aid kit.

Sunday, December 27, 2009

Le mot juste

From a Language Log discussion:

Geoff Pullum
There really is a lot we don't know about syntactic processing.
Shimon Edelman
I guess the same could be said about Santa Claus, and for the same reason.

I also wrote a more technical answer to Geoff's puzzle.

Friday, December 25, 2009

Where I Make a Premature New Year's Resolution

I better post more, or give up. Lots going on, some maybe interesting, but my attention has been diverted too often until this socially enforced quiet of Christmas.

NIPS (Dec 7-12) was worth it. I helped with two posters and a workshop talk, I caught up with many friends and colleagues, did some recruiting, and I discussed favorite research problems -- sparse models, unsupervised learning from text, Web-scale inference. I also skied icy groomers and a some softer high-elevation wind buff off-piste (Blackcomb Glacier, Harmony Bowl, and Whistler Bowl) with some my usual NIPS ski partners, who unaccountably tolerate me even though I'm older and slower, especially this time, feeling tentative as I was from the April ankle break and the big powder skis that are not ideal for icy or narrow. It was good to take an extra day to just ski, which helped me get back to a more satisfying skiing form.

Less than a week after that, I flew on what might have been one of the last planes to land at PHL as the big Saturday snowstorm got going. Exciting landing, zero visibility until almost touchdown, snowpacked runway. Passengers applauded spontaneously as the reverse thrust roar died down. First time I heard applause on a domestic flight in decades.

It was also interesting to travel between Philly and NY on Monday. Acela going in was a bit late, and moved more slowly than normal. Return train was cancelled, but I got on another one that was running several hours late, where the vestibules (or whatever they are called) at the end of the cars where the doors are were packed with snowdrifts that the conductors had only partially cleaned out. Spent a useful, fairly quiet day at the Google office followed by dinner with friends before the train back.

All of this travel gave me enough time to finish two paused books:

  • Reading in the Brain: the best neuro-cog book I've read in a long time. Dehaene covers a lot of terrain, goes deep, but he is always engaging, never confusing or ponderous. He writes very well, gently demanding concentration and laying out sustained arguments with a long history. Only in criticizing the whole-language approach to reading instruction does he show some (justified) impatience.
  • Nothing to Be Frightened Of: How can a book all about death be simultaneously truthful and entertaining? Besides Machado de Assis's Memórias Póstumas de Brás Cubas, that is? Julian Barnes, here in memoir form, solves the puzzle with repeated short dips into family history, literary history, and cultural critique. He never stays more than a few pages on any topic, meandering about as his disturbing topic demands. He certainly does not relieve his deep funk, or the reader's, but he sprinkles everywhere aperçus and paradoxes that enliven the subject, as it where. His fears reflect how far affluence, sedentary life, democratic public safety, and medicine have replaced rapid, violent demises by long-drawn decay and illness. Having taken a wilderness first aid class recently, and having experienced how close to the edge one can quickly get in the wild, my dread list is, maybe deludedly, somewhat different.

Saturday, November 28, 2009

Long hiatus, a great book, and skiing in the forecast

I've been so busy with work and travel that I've not had the time and focus for writing. We have been in Philadelphia for a family Thanksgiving. At the Penn Book Center a week ago, I came upon Stanislas Dehaene's Reading in the Brain. I'm only 100 pages in, but it has already taken top rank in my mental gallery of science books. It's a deep, serious work about brain architecture, perception, and cognition, but written without any pretense or pomp, direct, full of striking news about how reading is implemented in the brain. In these first chapters, Dehaene focuses on what is known about reading's implementation and on the experimental evidence for the findings. I'm looking forward for when he gets into how the design of writing systems is influenced by biological constraints, and how reading gets localized to the same particular brain region, which he calls the “letterbox,” across languages, writing systems, and cultures (He's already given a tantalizing preview regarding the localization of Kanji and Kana in Japanese reading).

My new skis, Black Diamond Justices with Marker Baron bindings, are waiting for me at Marmot Mountain Works in Berkeley. I'm not likely to be able to use them the coming weekend because I signed up for an ASI wilderness first aid class at Sugar Bowl, but I'll be in Whistler the following weekend, where they have been having the best early season in a long time. Now I just need the jet stream to behave to keep both pineapple express and deep freeze away.

Sunday, November 8, 2009

Jazz and Tropicália

Spent the weekend in San Francisco to attend three San Francisco Jazz festival concerts and otherwise enjoy the city. The three concerts were all over the place, in a very good way. Friday night we heard the Portuguese/Cabo Verde singer/songwriter Sara Tavares with a strong quartet from Cabo Verde and Portugal. She was suffering from a respiratory ailment (she said flu but I have difficulty in believing that she would have been able to perform at all with the flu) and the first few songs lacked energy somewhat. She got stronger through the set, however. She did a great Balancé, one of her best-known songs, and she generally showed a stylistic freedom and disregard for the bonds of Portuguese song convention that were very refreshing. Not a perfect performance, besides her illness they had equipment glitches, but lots there to like and a great rapport between Sara and her band.

On Saturday we saw Savion Glover and his outstanding band. Wow. Edge of the seat work, like with Gonzalo Rubalcaba the other day. Except that one can get a whiff of Rubalcaba on recordings, while Glover has to be seen. Glover's dialog and debate with his band members (all amazing, but Tommy James on piano and Patience Higgins on saxophones had especially rich interactions with Glover), the depth of rhythmic variation around Coltrane and blues themes, were breathtaking, so intense and surprising that one truly forgot to inhale, with not a second of slack.

Today, we heard John Arbercrombie with Mark Feldman (violin), Drew Gress (bass) and Anthony Pinciotti (drums). Abercrombie and Feldman are the core of several of my favorite recordings of the last several years: Open Land, Can 'n' Mouse, Class Trip, and The Third Quartet. Today's set was mostly compositions from Wait Till You See Her, a new record I didn't know. I had heard Abercrombie live only once before, with Larry Coryell and Badi Assad on the live tour of Tree Guitars. In today's set he and his partners, especially Feldman, did what we really hope for in live jazz, going outside the tighter confines of a studio recording to wander, explore, tease and draw the audience.

In between, we enjoyed a beautiful cool fall weekend walking around San Francisco, and a delightfully varied exhibition of modern Brazilian art at the Yerba Buena Arts Center.

Sunday, November 1, 2009

Pay Powder

Pay Powder: How much would you pay to ski fresh powder? Does it even have a price? Squaw valley seems to think so. Squaw is built on private rather than forestry service land with limited access to the backcountry. That is set to change as it re-jigs lift pass prices in light of the credit crunch. [...] The resort has always strictly controlled access to the out-bounds terrain including the National Geographic bowl, at least if you are a Squaw paying customer. For that you will need to purchase the eye-wateringly expensive platinum pass at $1699. This enrolls you in the “out of bounds program”. (Via PisteHors)

The sky is not falling. Squaw's program is intended for well-off pass holders for whom the extra cost of a platinum pass compares well with the cost of a day of heliskiing. Nothing obliges Squaw to offer convenient lift access to the backcountry for free. Since Squaw has always forbidden backcountry access from its lifts, nothing has materially changed for the worse with this new offer. Except maybe for the green-with-envy feelings it creates on those confined to inbounds tracked out turns while the high-living are taken on fresh tracks just over the boundary rope. If you can't stand the wave of envy, there's nice Sugar Bowl just a few miles NW as the crow flies who allow easy backcountry access from their lifts. And there's always skinning for your turns from a multitude of trailheads around the Tahoe basin.

Friday, October 30, 2009

An open letter to Steve Levitt

An open letter to Steve Levitt: [...]The point here is that really simple arithmetic, which you could not be bothered to do, would have been enough to tell you that the claim that the blackness of solar cells makes solar energy pointless is complete and utter nonsense. I don’t think you would have accepted such laziness and sloppiness in a term paper from one of your students, so why do you accept it from yourself? What does the failure to do such basic thinking with numbers say about the extent to which anything you write can be trusted? How do you think it reflects on the profession of economics when a member of that profession — somebody who that profession seems to esteem highly — publicly and noisily shows that he cannot be bothered to do simple arithmetic and elementary background reading. Not even for a subject of such paramount importance as global warming. (Via Real Climate)


Sunday, October 25, 2009

AlpControl claims world’s lightest wide skis

AlpControl claims world’s lightest wide skis: AlpControl claims its new carbon fiber skis weigh under 2000 grams (4.3lbs) a pair in 175cm with a shovel of 120mm. That’s less than some of the skinny competition skis used the the Pierra Menta competition. However it is ultra durable, the manufacturer claims it might be the best investment in your skiing life.

(Via PisteHors)

But how will it ski? All that elasticity could make for an interesting ride on hard snow.

Saturday, October 24, 2009

Gonzalo Rubalcaba Quintet

I can't get words together. I was at the edge of my seat for much of almost two hours of music that was both unpredictable before it happened and the only way it could be afterwards. I think of the most memorable mathematical proofs, of Braque, Delauney, Pollock; Stravinsky. Deeply thought-out construction that yet feels spontaneous, alive, constantly evolving under its own dynamics. Clusters of notes bouncing among players to the point that one can't figure out what is coming from which instrument, and yet supreme clarity. Again with the mathematics simile, that feeling of vertigo when a inscrutable build-up of argument opens up into the revelation of a final step that makes everything make sense. Or as after ascending a steep snowy slope for hours, the other peaks start poking over the looming ridge, light spreads, and the horizon finally falls away to an infinite variety of landscape.

We've been lucky with several very good jazz outings over the last year, but this one was on a different (hyper)plane. Rubalcaba's virtuosity on the piano was never gratuitous, and rebounded off an incredibly skilled ensemble. Ernesto Simpson with a crisp, airy command of the drums and Yunior Terry with an insistent deep rumble on the bass spread out the piano's rhythmic sparks into space-filling creations (that Pollock idea). Alex Sipiagin on trumpet started maybe a bit tentative, but for the rest of the set and encores he grew and grew with urgent calls, oompah humor, almost painful buildups, longing. Yosvany Terry on alto and tenor sax was the hub of the ensemble, picking up ideas from the piano that spread through the ensemble, and reacting to them with discoveries and surprises now funny, now scary, spinning wheels of notes (that Delauney idea). When hints of a standard were brought in, it was never in the sometimes lazy way in which other bands take a break from hard work by indulging the audience's recognition. Instead, it became quickly transformed into something else, stretched, bent, rebuilt; in another mathematical (or Pollock) analogy, like a chaotic dynamics breaks up an initially compact region into a shifting flock of points.

One sad aspect of the concert is that the audience was middle-aged or older. I know that tickets are very expensive. But also, music like this is about individual engagement between the band and each serious listener, not about creating a framework for social interaction within the audience for an overwhelmingly social youth culture.

Saturday, October 17, 2009

Music season starts

More musical events on my calendar the next couple of months than anytime since the legendary Gulbenkian Foundation festivals in Lisbon in the late 60s-early 70s. Coming week:

Wednesday, October 14, 2009

Farhad Manjoo on Google Wave’s Complexity

Farhad Manjoo on Google Wave’s Complexity: On Wave, every misspelling, half-formed sentence, and ill-advised stab at sarcasm is transmitted instantly to the other person. This behavior is so corrosive to normal conversation that you'd think it was some kind of bug. (Via Daring Fireball)

This is the silliest claim about human communication I've read in a long time. As a writer, Manjoo may be uncomfortable letting others see his communicative sausage being made. But before teletypes and their successors, "normal conversation" — face-to-face conversation — was — still is — all hesitations, false starts, disfluencies, failed attempts at humor, misheard words, losses of attention. That's what we are, that's how we work. We perceive, interpret, and think as we talk, and a lot of that is trial and error.

Manjoo complains that instant transmission of typed characters makes the typist "self conscious." Translation: I'm used to hiding what/how I'm really thinking when I communicate online, and I feel uncomfortable coming out from behind the curtain.

Answer to comments: What Manjoo wrote was at best a unwarranted generalization from his personal reaction to the feature. He made an empirically false claim about human communication; even it the claim is charitably interpreted to be only about typed communication, he cited no empirical evidence about the alleged corrosiveness. Why did he feel the need to make a sweeping generalization, instead of honestly reporting his own experience, and that of others he interviewed, and let us draw own own inferences? The disease of the current 24-hour punditry cycle is an escalation of instant assertion unsupported by evidence to demonstrate the pundit's manhood (how's that for a sweeping generalization?)

Much before the BSD talk program, there was the TENEX talk command that had the same character-at-a-time behavior and may have been the first such program I used. Personally, I didn't feel it corroded my ability to communicate, but I won't turn that into a general claim. I was using Wave editing a document a collaborator recently, and the immediate feedback was useful to what we were doing, especially the marker that showed where he was editing.

Monday, October 12, 2009

Yesterday's book

Finished reading The Age of Wonder. I used to avoid the Romantics, but that was before learning about Banks, the Herschels, Davy and their overwhelming enthusiasm for science as hope, poetry, and practical success. Highly recommended.

Today's listening

The best This American Life in a while, on how we make health care so expensive. Grab it while it's available for free!

Sunday, October 11, 2009

Today's listening

Avoidance of expensive roaming data access in Europe last week meant that I had a big podcast backlog. Just caught up with Planet Money, which is way sharper and less slavish to conventional wisdom than Marketplace.

Producer vs Consumer Viewpoints on the News Business

Producer vs Consumer Viewpoints on the News Business: [...] The trouble is that when journalists talk about journalism, they talk about it from the producer point of view. What Google does, from the media-as-production point of view really isn’t much better than what the paper boy does. But from the consumer point of view, having a paper boy who will fetch any paper you want in the world, for free, at any time, and open the paper to the page you were looking for is a massive improvement. [...] I think it’s interesting that journalists seem to have no problem following this dynamic when it comes to the car industry. This has been a terrible 12 months to be in the business of building cars, either as a worker or an owner or a manager. But it’s been a fine time to buy a car. There’s no car shortage. And there’s not going to be a car shortage. Drivers are in great shape. And it’s about the same with the news. Has there ever been a better time to be a news junkie? (Via Matt Yglesias)

This is an important insight, I think. As a fellow news junkie, I love this new world. In fact, I love it too much: I consume more news than I ever did, which maybe is a suboptimal use of my time. But I worry that there isn't a successful mechanism for compensating the news writers for the benefit I'm getting. After all, I used to subscribe to the NYT, and I would be happy to pay that amount to an aggregator that would distribute the proceeds to news sources in proportion to their traffic. It would be interesting to do a study on how many news junkies have dropped subscription to paper news sources over the last five years, and how much of that would be potentially recoverable as news revenue with the right mechanisms.

Update: Public radio stations have a funding model that has more or less worked even as their federal sources of funding have decreased. Pledge drives are not fun, but they bring in that fraction of the audience that cares enough to volunteer some support. I'd support Web news if the right mechanism was in place. In fact, I'd support public broadcasting more if there was a mechanism for a lump annual supporting multiple broadcasters. I see no reason why Web news sources cannot achieve the same kind of support.

Once again, future-safe archives

Once again, future-safe archives: Every time a relative passes this issue comes front and center for me. Most other times it's just lurking in the shadows. [...] We need one or more institutions that can manage electronic trusts over very long periods of time. [...] I've felt that universities would do the best job, since they already need to maintain the work of their professors, possibly in partnership with technology companies. This could be a huge source of endowments, as wealthy people with a vision for techology compete to build long-lasting monuments to their creativity and generosity. (Via Scripting News).

A big challenge here is to estimate the endowment needed to keep some amount of data archived in perpetuity, given the uncertainties of cost as technologies and environmental conditions change, and of return on endowment. The Barnes Foundation serves as a cautionary case. Its endowment turned out to be insufficient for the cost of keeping the collection safe. What saved the situation was that the popular appeal of the collection brought in other sources of funding in exchange for transformation into a more conventional museum. Most digital archives would not have that luck in a crunch.

Fortunately, for easily-copied digital information we don't need to rely on a small number of institutions for long-term preservation. Projects like LOCKSS demonstrate the possibility of distributed preservation using many cheap copies. Could they be institutionally extended to preserve personal data?

Update: David Rosenthal on digital preservation.

Saturday, October 3, 2009

Shameless consumption question

Which new all-round (front-side-backcountry) powder boards?

I plan to use Marker Baron bindings for mostly inbounds skiing. I'm already fully equipped with Dynafit setups for the backcountry. I'm not interested in the heavy and stiff boards that Tahoe youngsters favor, but I'm looking for a bit more frontside performance than my light Karhu BC 100s.

New thread: today's reading and listening

I've failed miserably at keeping this blog up to date with what I'm reading or listening to. I'll try a new scheme: whenever I think of it, list what I've been reading or listening to that day. Here it goes:

Podcasts courtesy of (shameless plug warning) the indispensable Android app Listen.

Antisocial networking

Antisocial networking: I just got my invitation to Google Wave. The prototype that's now public doesn't have all of the amazing features in the original video demos. At this point, it's pretty much just a way of collecting IM-style conversations all in one place. But several of my friends are already there, and I've had a few conversations there already. [...] Right now, my standard set of tabs includes my Gmail, calendar, RSS reader, New York Times homepage, Facebook page, and now Google Wave. Add in the occasional Twitter tab (or dedicated Twitter client, if I feel like running it) plus I'll occasionally have an IM window open. All of these things are competing for my attention when I'm supposed to be getting real work done. [...] The bigger problem is that these various vendors and technologies have different data models for visibility and for how metadata is represented. [...] This is all the more frustrating because RSS completely solved the initial problem of distributing new blog posts in the blog universe. [...] Could there ever be a social network/microblogging aggregator?[...] In the end, I think the federation ideas behind Google Wave and BirdFeeder, and good old RSS blog feeds, will ultimately win out, with interoperability between the big vendors, just like they interoperate with email. Getting there, however, isn't going to happen easily. (Via Freedom to Tinker)

This is getting out of hand, even for those of us who have stayed away from Facebook and Twitter. Too many disparate streams, no single alerting system. Browser tabs are not the right tool for attention management. However, we seem so close: Google Wave, PubSubHubbub, and rssCloud all involve open publish-subscribe protocols that could be plumbed together to create notification hubs for multiple streams, filtered and ranked according to user preferences.

Sunday, September 27, 2009

A Midsummer Night's Dream

We had a great time at the Cal Shakes performance of A Midsummer Night's Dream this afternoon. Contrary to silly Merc "where are my fairies?" review, we felt that this cheeky reconstruction was better paced than a traditional rendering, and it brought out better the sharp edges of love that are sometimes gilded by fairy light. Doug Hara as a Puck bubbling with physical humor, Danny Scheie as a full-of-himself Bottom, and Lindsey Gates as a sharply snippy Helena were my favorites, but the whole cast did a great job in keeping the play moving and entangling humor, fear, and sexual tension.

Thanks to Brad DeLong for the blogged recommendation.

Sunday, September 20, 2009

Localization of emotion perception in the brain of fish

Localization of emotion perception in the brain of fish: This is beautiful work, showing that certain areas in the brain of mature Atlantic Salmon 'light up' when the animal is asked to categorize the emotions expressed by a set of (human) faces:

More amazing still is the fact that the fish performed this task while dead. (Via Language Log).

Read the whole thing. Some great comments too, and links to related material. Don't laugh too hard at fMIR misinterpretations, we are all susceptible to wishful thinking and to reading too much into laboriously collected data, and all statistical analyses of complex data use simplifications that could get us in trouble.

Someone better at comedy than me might have a go at translating the Monty Python dead parrot pet shop sketch into a dead salmon sketch at a neuroimaging conference. At least "pining for the fjords" would be just right already.

Saturday, September 12, 2009

The laws of conditional probability are false

The laws of conditional probability are false: This is all standard physics. Consider the two-slit experiment [...] In standard probability theory, the whole idea of conditioning is that you have a single joint distribution sitting out there--possibly there are parts that are unobserved or even unobservable (as in much of psychometrics)--but you can treat it as a fixed object that you can observe through conditioning (the six blind men and the elephant). Once you abandon the idea of a single joint distribution, I think you've moved beyond conditional probability as we usually know it.

As I noted in a comment to the original posting, the work of Chris Fuchs and his collaborators gives intriguing ways out from the apparent contradiction between conditional probability and quantum mechanics. Fuchs's latest paper on the subject is Quantum-Bayesian Coherence.

Friday, September 11, 2009

For Alan Turing, a real apology for once

For Alan Turing, a real apology for once: In an age where (as Language Log has often had occasion to remark) many purported public apologies are just mealy-mouthed expressions of regret [...] it is good to see a genuine and direct apology for once, addressed (though more than half a century too late) to a man who deserved admiration, gratitude, and respect, but was instead hounded to death. The UK Prime Minister, Gordon Brown, has released a statement regarding the treatment of Alan Turing in the early 1950s, and the operative words are:
on behalf of the British government, and all those who live freely thanks to Alan's work I am very proud to say: we're sorry, you deserved so much better.
(Via Language Log)

If you read the whole Downing Street statement, you might feel a twinge of regret that Turing's other gigantic contributions to humanity beyond cracking Enigma were not mentioned, but the apology is nevertheless strong and poignant, and Gordon Brown deserves praise for saying clearly what had been unsaid for so long by those in power. Thank you.

Sunday, September 6, 2009

Data and metadata: Together again

Data and metadata: Together again: Terry Jones has an excellent post that lists the problems introduced by maintaining a hard distinction between metadata and data. [...] This is all very squishy and messy because the distinction is, as Terry says, artificial. It comes from thinking about experience as content that gets processed, as if we worked the way computers do. More exactly, it comes from thinking about experience as a set of Experience Atoms that then have to be assembled; metadata are the labels that tell you that Atom A goes into Atom Z. But experience is far more like language than like particle physics or Ikea assembly instructions. And that’s for a very good reason: linguistic creatures’ experience cannot be understood apart from language. Language doesn’t neatly separate into content and meta-content. It all comes together and it’s all intertwingled. Language is so very non-atomic that it makes atoms realize how lonely they’ve been.

Or, as Zellig Harris argued, natural language is its own metalanguage.

I spoke recently at a VLDB panel where I really wanted to come at the issues from this point of view, but I felt that it would sound way too abstract to a database audience. Maybe I shouldn't have chickened out, but you can't demolish a deeply vested set of assumptions in just seven minutes...

Saturday, September 5, 2009

Agnès Varda and Georges Brassens

Just came back from the delicious Les plages d'Agnès. Varda lived on a boat in Sète as a child, so she could not avoid including a snippet of Sète native Georges Brassens's Supplique pour être enterré sur la plage de Sète, maybe my favorite among his songs.

Thursday, August 13, 2009

Introducing RECAP: Turning PACER Around

Introducing RECAP: Turning PACER Around: [...]Today, we are excited to announce the public beta release of RECAP, a tool that will help bring an unprecedented level of transparency to the U.S. federal court system. RECAP is a plug-in for the Firefox web browser that makes it easier for users to share documents they have purchased from PACER, the court's pay-to-play access system. With the plug-in installed, users still have to pay each time they use PACER, but whenever they do retrieve a PACER document, RECAP automatically and effortlessly donates a copy of that document to a public repository hosted at the Internet Archive. The documents in this repository are, in turn, shared with other RECAP users, who will be notified whenever documents they are looking for can be downloaded from the free public repository. RECAP helps users exercise their rights under copyright law, which expressly places government works in the public domain. It also helps users advance the public good by contributing to an extensive and freely available archive of public court documents. (Via Freedom to Tinker.)

This is so cool! If it takes off, the document analysis and text mining possibilities will be endless.

Sunday, August 9, 2009

Charlie Haden's Quartet West

Tonight at Yoshi's in San Francisco: Charlie Haden on bass, Ernie Watts on tenor sax, Alan Broadbent on piano, and Rodney Green on drums for Haden's birthday. Limpid sound, exquisitely balanced group playing, deeply lyrical. I'm out of words, really. Ana agrees that this might be best live jazz we've listened to since David Holland in 2004.

Thursday, August 6, 2009


Jeff Klein, the subject of the Open Source podcast I listened to at the gym this morning mentioned Cavafy and his poem Ithaca. I hadn't read that poem in decades, so tonight I picked Daniel Mendelsohn's recent translation that has been on my bedside table:

As you set out on the way to Ithaca
hope that the road is a long one,
filled with adventures, filled with understanding.
The Laestrygonians and the Cyclopes,
Poseidon in his anger: do not fear them,
you’ll never come across them on your way
as long as your mind stays aloft, and a choice
emotion touches your spirit and your body.
The Laestrygonians and the Cyclopes,
savage Poseidon; you’ll not encounter them
unless you carry them within your soul,
unless your soul sets them up before you.

There are a few skills that improve with age; understanding more of the meanings of a great poem might be one of them.

Sunday, August 2, 2009

It Never Stops

It Never Stops: Lovely piece by Maira Kalman on Ben Franklin:

Don’t mope in your room. Go invent something. That is the American message.
Electricity. Flight. The telephone. Television. Computers. Walking on the moon. It never stops.
(Via Daring Fireball.)

Some of the best things in Philly came that way, from Headhouse Square to Bartram Gardens to the ENIAC.

Saturday, August 1, 2009

Breaking Rules, Breaking Trail

Breaking Rules, Breaking Trail: [...] An unexpected planned mission in Argentine Patagonia (Via Porters Sports.)

Wonderful trip report. This beautiful picture of Argentinian powder above Lago Nahuel Huapi makes me miss so much not skiing in South America this season after five years of wonderful adventures in Chile and Argentina. But I need to give time to my Tahoe-broken ankle to recover fully.

AT&T’s Inability to Handle the iPhone

AT&T’s Inability to Handle the iPhone: [...] Apple slagged AT&T twice during the WWDC keynote, for their inability to offer iPhone users either MMS or tethering. These are not advanced cutting edge mobile phone features. That was seven weeks ago, and AT&T still hasn’t said a peep about making either feature available. Of course Apple is furious. They are dependent on an incompetent partner in their biggest market. (Via Daring Fireball.)

Looking for any nice used bridges to purchase? Apple is a very smart company. They knew exactly what they were getting into when they signed the exclusive with AT&T. It's not as if AT&T's mediocrity has been a big secret. Apple must have concluded that the commercial benefits of the exclusive outweighed the costs, factoring in any (justifiable) doubts about AT&T's ability to perform.

It is very convenient for Apple to be able to hint through media and blogging stenographers that any problems are AT&T's fault, even though Apple knew perfectly well what they were getting into.

However, we may be able to finally see the truth behind this shadow boxing if the FCC's spine continues to stiffen.

Saturday, July 25, 2009

The AP, Stuck in a Hole, Digs Deeper

The AP, Stuck in a Hole, Digs Deeper: Richard Perez-Pena, reporting for the NYT on the AP’s latest announcement regarding their attempt to restrict their articles from being linked to or appearing in search results:

Tom Curley, The A.P.’s president and chief executive, said the company’s position was that even minimal use of a news article online required a licensing agreement with the news organization that produced it. [...] Each article — and, in the future, each picture and video — would go out with what The A.P. called a digital “wrapper,” data invisible to the ordinary consumer that is intended, among other things, to maximize its ranking in Internet searches. The software would also send signals back to The A.P., letting it track use of the article across the Web.
They have no idea what they’re talking about. Seriously, look at this gibberish. Someone just sold the Associated Press a bag of magic beans. (Via Daring Fireball.)

Ten years ago, the music industry bought into some magic beans called SDMI. I remember discussions back then where some people really believed it would be possible to tie the wrapper indissolubly to the contents. We know how that story evolved. The magic beans were soon discarded for less magic but somewhat more effective lobbying for new laws (DMCA) and legal action by the industry associations. It could be that the AP believes in magic beans, but more likely they believe in their lawyers' ability to use the DMCA to go after any bigger players who do not respect their wrappers. IANAL, but my hunch is that they could erect a plausible case with huge potential downsides for the defendant if there were able to move the argument from fair use, seen as protecting manual copy-and-paste of small quotes by individual authors, to DMCA infringement in using wholesale automatic means to slit AP wrappers in a news aggregator.

Many Web publishers and aggregators already pay the AP for their feeds. IMO the AP is trying to create a legal environment in which any publisher/aggregator that gets big enough will feel compelled to pay up. They may not be able to maintain the high prices they charged in the days of paper, but whatever price they are able to charge is better for them than the zero they get from all those smaller aggregators today.

Behind all of this obfuscation, it's really all an argument about who pays and how much. The news costs something to collect, write, and edit. The AP and other news sources might believe that they have a rarer and more valuable product than they really do — who doesn't feel that way about their creation? — but there is some value that should lead to someprice. What happened is that the old price discovery mechanisms (I buy this physical newspaper or that one, or don't and get my news from radio or TV; with several regulatory and structural inefficiencies that engendered monopoly rents) have not been replaced by sustainable new ones. Thus, the AP is trying to design a new market mechanism that exploits the legal infrastructure that was created by media lobbying. It may be unseemly, but so far we have not been very good at developing an unregulated digital market for news and other content that can effectively balance demand and supply by paying producers enough to keep producing.

Some argue that the problem is that the incumbent producers are too inefficient and just trying to protect their lucrative inefficiency. Sure, but it's not as if there are many examples out there yet of more efficient news producers (as opposed to aggregators and editorializers) who are making a decent living providing all that we seem to want to read today.

I care a lot about the news. I subscribed to the paper NYT for 24 years in CA, NJ, and PA. I don't now because it stayed pretty much the same but my news needs diversified as the Web increased news diversity. I'm also a long-time subscriber to Salon. But I find myself reading it less and less. I'm still a subscriber of the New Yorker, Scientific American, Backcountry magazine, and Ski Journal. I still read much of the New Yorker (and delight in its cartoons) and Ski Journal (a beautiful use of high-gloss print), but the rest is being displaced by a multitude of aggregators and blogs, from Google News to the Loom to Wild Snow that give me access to a much broader range of news in a more timely manner. I'd be happy to pay as much or more as I did for the NYT to support all of those sources in some distributed way that doesn't require me to deal with pay walls for individual properties. Pay walls are very inefficient both economically and mechanically because my news reading only touches a teeny fraction of each source's content over time. I'd love a mechanism where I pay one payee a flat subscription and pennies flow to sources in proportion to the amount of their content I read (BTW, why don't all public radio stations in a metro area do that, based on audience metering?). Market mechanism designers, were are you?

Thursday, July 23, 2009

The Journal of Experimental Linguistics

The Journal of Experimental Linguistics: JEL is a linguistic "journal of reproducible research", that is, a journal of reproducible computational experiments on topics related to speech and language. [...] In all cases, JEL articles will be accompanied by executable recipes for re­creating all figures, tables, numbers and other results. These recipes will be in the form of source code that runs in some generally-­available computational environment. [...] Although JEL is centered in linguistics, we aim to publish research from the widest possible range of disciplines that engage speech and language experimentally, from electrical engineering and computer science to education, psychology, biology, and speech pathology. In this interdisciplinary context, "reproducible research" is especially useful in helping experimental and analytical techniques to cross over from one sub field to another. (Via Language Log.)

Bravo! Great concept, open access. I am delighted to see the LSA and this outstanding editorial board (alright, I'm not totally objective with so many friends, current and former colleagues there) take this bold step when so many other proposals and arguments in related societies and fora have failed or are barely limping along.

Thursday, July 16, 2009

Misunderstanding Bell Labs

On The Washington Note, James Pinkerton claims that cost control in health-care reform will reduce surpluses and thus hurt the kind of discovery and invention at Bell Labs funded by the AT&T surplus. Except that the generous funding of Bell Labs by AT&T in the monopoly period was a consequence of AT&T being a regulated monopoly. Until the 1984 consent decree, AT&T's surpluses were determined by government tariff regulations, and AT&T's expenditures where substantially determined by political considerations ranging from providing employment where it mattered to state and Federal legislators, to supporting national defense goals through R&D. That same AT&T had the power to control what could be attached to its network, to the point of in effect forcing most users to rent its ponderous devices. Users had little to no choice choice in devices, tariffs, and terms of use. Translated to medicine, that AT&T was like a national HMO with one-size-fits-all treatments and an immovable bureaucracy, and with a gold-plated research arm that looked very good in Washington but did not matter much to everyday quality of care. I doubt that is what James Pinkerton wants to see. And I don't see the substantial private sector surpluses of for-profit health insurers, hospitals, or pharma going to much fundamental research. Instead, all the fundamental biomedical research there is (and there is probably less fundamental reseach than there should be) is paid by our Federal income taxes and by a few private foundations (and thus, indirectly, by the tax deductions on charitable giving).

Sunday, July 12, 2009

Regina Carter

Regina Carter quintet at Stanford last night, with Jeff Sanford (clarinet/flute), Fred Harris (piano), Seward McCain (bass) and Akira Tana (drums). Intriguing reinterpretations of classics and classic forms, full of unexpected twists and turns. A quintet that at times sounded like a full orchestra, but also full of intense, mind-bending individual touches. Besides Carter, Harris and Sanford had excellent solo contributions.

Saturday, July 11, 2009

The iPhone ate my homework

Why I Hate the iPhone: [...] I hate the iPhone for irrational reasons related to the number of times I get emails like these:

F, I know you want me to send you the text and you should have it in your hands soonish but I'm stuck sending things from my iphone.

[...] For me, the sentence "I'm sending this from my iPhone" does not instill a sense of awe and techno-envy but instead I get a sinking feeling that I'm not going to see a certain draft or a certain figure for a while yet to come. (Via FemaleScienceProfessor.)

The iPhone here is just a popular proxy for the whole class of smartphones; I remember first feeling that way about those "sent from my BlackBerry" signature lines. But there's a deeper issue here: much our work is still stuck where we can't get to it easily from a smartphone. It's a human factors problem — reading and writing are difficult on small screens and keyboards; it's a software problem — technical writing and technical data rely on software that typically does not work on a mobile device or on the cloud; it's a systems problem — much of our data is on machines with restricted connectivity for technical and security reasons. I'd love to stop carrying a laptop everywhere, but it's unlikely that I will be able to in the foreseeable future.

Friday, July 3, 2009

Future Music

I recently became a member of SF Jazz, and just bought tickets for some of their festival concerts in the fall:

I might be tempted to book a few more later, but these were the dates that I could be fairly sure of now.

Why isn't there a similar organization in Philly? There are several venues there that bring excellent jazz and world music artists, but if they worked together, they could create a lot more focus and excitement around the music.

Monday, June 15, 2009

Rick and friends on Lassen

Mount Lassen 6/12-14/2009: Rick, Erin, Adam, and I decided to brave the rain/thunder showers and headed up to Mt Lassen for the weekend. [...]

(Via Rick's World.)

I so much wish I could have been there... But the ankle still needs work :(.

An economic solution to reviewing load

An economic solution to reviewing load: Hal Daume at the NLP blog bemoans the fact that “there is too much to review and too much garbage among it” and wonders “whether it’s possible to cut down on the sheer volume of reviewing”. [...] There is an economic solution to the problem that bears consideration: Charge for submission. This would induce self-selection; authors would be loathe to submit unless they thought the paper had a fair chance of acceptance. Consider a conference or journal with a 25% acceptance rate that charged, say, $50 per submission. (Via The Occasional Pamphlet.)

I entered the following comment in Stuart's blog:

I don’t think this adds up. Consider a typical academic CS research group with one professor and a few graduate students. As is typical, as a conference deadline approaches, they have several papers in the works, say four, in different states of completion; maybe one is in very good shape, two in fair shape, and the other in poor shape (these are again typical numbers in my experience). If just one paper is accepted, the professor and one student attend the conference, at a typical cost of $4000 for travel, accomodation, and conference registration. If three papers are accepted, maybe the professor and three students attend, at a total cost of $8000. Compared with those costs, the difference between $50 and $200 is utterly trivial; just a couple of slightly better meals, or a cab to the airport instead of the shuttle, would make the difference. The only way this could work would be to have submission charges that are significant relative to the other costs of paper creation and presentation. But if the charges were that high for a given venue, then 1) other venues would undercut it, and 2) rejections would lead to open warfare with authors claiming they were swindled of their fees by inappropriate rejections. The lack of incentive alignment between getting as much from submission fees as possible and doing as little in reviewing as possible would be very destructive of already fragile institutions.

Update: In his comment below, Mark suggests that a two-tier system can quickly get rid of most of the bad submissions, leaving more reviewing capacity for the remaining submissions. Unfortunately, as Darwin noted, “Ignorance more frequently begets confidence than does knowledge.” I've seen way too many highly confident reviewer dismissals of valuable work they were unwilling or unable to understand. Even a relatively low false-positive rate in the initial screening would be enough to create a much bigger hurdle for those submissions that are harder to understand because they are off the beaten path.

Sunday, June 7, 2009

This American Life on the Rating Agencies

This American Life on the Rating Agencies: This weekend's 'This American Life' is about the rating agencies. [...] A few excerpts:

"We hired a specialist firm that used a methodology called maximum entropy to generate this equation," says Frank Raiter, who until 2005 was in charge of rating mortgages at Standard and Poors. "It looked like a lot of Greek letters."

The new bonds were based on pools of thousands of mortgages. If you bought one of these bonds, you were basically loaning money to people for their houses. What the equation tried to predict was how likely the homeowners were to keep making payments.

The system made sense, Raiter says, until loan issuers started offering mortgages to people who didn't have great credit and in some cases didn't have a job.

Raiter says there wasn't a lot of data on these new homebuyers. He says he told his bosses they needed better data and a better model for assessing the riskiness of the loans.
(Via Calculated Risk.)

E. T. Jaynes must be turning in his grave. I'll listen to the podcast soon, but this quote waves a big red flag of overfitting. The last ten years of maxent-related work in machine learning and natural-language processing sow clearly that the maximum entropy principle on its own can be highly misleading when it is applied to data drawn from long-tailed distributions. That's why there's thriving research on ways of regularizing maxent models, for example by replacing equality constraints by box constraints. But even with decent regularization, maxent models are only as good as their choice of event types (features) over which to compute sufficient statistics. If there are correlations in the real world that are not represented by corresponding features in the model, the model may be be overly confident in its predictions.

Maximum entropy, like other statistical-philosophical principles (you know who you are), carries the unfortunate burden of a philosophical foundation that may to some appear to guarantee correct inference without the need for empirical validation. In the case of maximum entropy, the familiar argument is that it produces the least informative hypothesis given the evidence. That seems to imply safety, lack of overreaching. Unfortunately, the principle doesn't say anything about quality of evidence. What if the “evidence” is noisy, incomplete, biased? The principle doesn't say anything about finite-sample effects, as it came from statistical mechanics where the huge number of molecules made (then) those a non-issue. But in biological, social and cultural processes (genomics, language, social relationships, markets) we may as well bet that small-sample effects are never negligible.

Friday, June 5, 2009

Recently read

I have a pile of new fiction on my bedside table, but non-fiction is still winning:

The first two were a very entertaining visits to the watery world, which I mostly love from solid ground given my susceptibility to seasickness. The last one was hard to put down, although part of it was that embarrassing attraction to the scene of a disaster. And even Gillian Tett doesn't do what I think such articles and books should have done from the beginning: borrow Alice, Bob, Carol, Dave, and Eve from cryptography to draw protocol diagrams for CDOs, CDSs, ABSs, and all that insanity of unstable transactions.

Sunday, May 31, 2009

Podcasts and authors

Recently I caught up with a backlog of Radio Open Source podcasts where Chris Lydon interviewed authors of recently published fiction:

I knew Hemon from Nowhere Man, and I have his The Lazarus Project on deck on my bedside table, but I didn't know the others. Their wonderful conversations with Chris Lydon convinced me that I need their books. Tinkers is already on deck.

Data set selection

On Moon Landings, Michelle Malkin, P-Values, the Clintons, and the Magical Mystery Dealergate Conspiracy Theory: [...]The way this data is being used is almost the same. Singer ran six sets of regression analysis: one each for Obama, McCain, Clinton, Democratic and Republican donors, and another for those dealers who had made no political contributions at all. She was therefore testing six hypotheses. If these hypothesis were independent from one another (which, to be clear, in this case they aren't), the odds that at least one of the six would return a p-value of .125 or lower are better than 50:50! Not only are false positives possible -- they are practically inevitable, particularly if you test enough hypotheses and tolerate a low enough threshold for statistical significance. [...] (Via FiveThirtyEight.com: Electoral Projections Done Right.)

I feel so much better that it's not just machine learning that practices the arcane crafts of post hoc hypothesis and data set selection.

Saturday, May 30, 2009

Study: hacks often bamboozled by flacks

Study: hacks often bamboozled by flacks: Steven Woloshin et al., "Press Releases by Academic Medical Centers: Not So Academic?", Annals of Internal Medicine, 150(9): 613-618:

Background: The news media are often criticized for exaggerated coverage of weak science. Press releases, a source of information for many journalists, might be a source of those exaggerations.
Conclusion: Press releases from academic medical centers often promote research that has uncertain relevance to human health and do not provide key facts or acknowledge important limitations.
[...]The best thing, it seems to me, would be to enrich the journalistic ecosystem with more species in niches like the one that Goldacre's Bad Science column occupies — agile, razor-clawed predators culling the herds of science-news herbivores that graze the green shoots of press releases on the endless media plains. (Via Language Log.)

Or like Language Log, Real Climate, The Loom, Statistical Modeling, or Effect Measure, to mention some of my current reading. The blogospherian explosion has created a wealth of innovative lineages. I don't know how they will evolve and survive, but we are already getting more informed discussion of science than we ever did from “the press.”

Monday, May 25, 2009

Computation != Deliberation

Travel, work, all-consuming new research ideas keep getting in the way of blogging, and slowing down reading. I'm still struggling with Out of Our Heads, which keeps switching between infuriating cluelessness about computation and intriguing insights about the lack of a clear-cut boundary at the information processing level between the “inside” and the “outside” of the brain. Alva Noë repeatedly assumes that “computation” in the mind is just a kind of rule-following conscious behavior. Following this misconception, the intuitive leaps of an expert chess player or human recognition of faces are according to him are not (plodding, deliberate, searching, rule-following) computation. There must be something really weird in the coffee at the Department of Philosophy at Berkeley that keeps some of them (Dreyfus, Searle, Noë) from recognizing that even a simple amoeba computes to maintain some awareness of and ecologically appropriate behavior towards their shifting environment. (Thank you Dennis Bray for the a propos example).

To paraphrase again their colleague Brad De Long's long-running lament: why oh why can't we have more computationally literate philosophers?

Monday, May 4, 2009

Diversity in scientific data

How important is WolframAlpha?: I don’t know those areas well enough to give an example that will hold up, but I can imagine WA becoming the first place geneticists go when they have a question about a gene sequence or chemists who want to know about a molecule. (Via Joho the Blog.)

These are the worst possible examples. I've worked quite a bit with biologists and medical researchers, and the last thing they want is a single source for their research data. Genomic sequences or the 3-D structures of complex molecules are works in progress, with many sources with different strengths and weaknesses. Two of my recent bioinformatics papers are on how you can get better genomic annotation by combining multiple sources of evidence developed by different researchers with different methods. Much of the current progress on genomics, proteomics, and systems biology is about different approaches to annotation and information integration, and advances from comparing and combining different types of information.

Highly curated, single-source data is useful only in those areas where how the data is collected and curated is not a central part of the scientific debate. I can't think of a single area of science that I follow in which the core data are settled, from biology to linguistics. Diverse sources, openly exchanged, contrasted, and combined, are the lifeblood of data-driven science.

Sunday, May 3, 2009

Parental bragging

You'll probably hate it, Daniel, but I'm too happy and proud to not blog this news:

Daniel Pereira teaches English at a small, private school in Springfield that serves students who have dropped out or somehow fallen through the cracks in public schools. He shares his love of literature at GW Community School and over time has served as a mentor for students struggling with drug addiction, social anxiety or learning disabilities.
He uses offbeat techniques to engage students, such as teaching a class on graphic novels or using a Run-DMC song to teach iambic pentameter. As the school's college counselor, he also helps many students make a sometimes difficult transition. One parent wrote that his son, who was unhappy and shut off as a teenager, began to pay attention in Pereira's class. The teen developed interests in poetry and philosophy and is studying creative writing in college.
"Students often say that Mr. Pereira is the toughest teacher they've ever had, but also their favorite," wrote Alexa C. Warden, the school's director, in her nomination of Pereira.


Too much happening, not enough time to write properly about each:

  • Broke my ankle from a fall on a Donner Pass chute.
  • Moved from Palo Alto to Menlo Park.
  • Playing with NumPy for machine learning experiments.
  • Got stuck in Alva Nöe's book where he goes off the rails discussing computation.

50 to 1

50 to 1: As Greg says, Tufte would be proud 

(Via tingilinde.)

Superior visual communication. Besides the bus-out-of-wrecked-cars, the real time car counter nails it.


Went to Oakland's beautiful Art Deco Paramount Theatre last night to hear Mariza. Even for those of us who became utterly cynical about fado as it was force-fed to us by the censored radio of an oppressive regime (maybe especially for us), Mariza live breaks the cynicism. She acts the songs, she shares good jokes with the audience and the band, she recreates fado standards by bringing out a rhythmic core that had been swamped by treacle in the “official” renderings, she takes songs and poems that we had consigned to the dustbin of self-indulgent lament for lost glories and loves and revives them in a fierce, self-aware fight to take this culture from the hypocrites that exploited and suffocated it. She may not always succeed (there's too much baggage in 400 years of self-pitying colonialism), but she fights with such intelligence and energy that she completely won this audience, even us cynics. Her band (Angelo Freire, Diogo Clemente, Marino de Freitas, Vicky Marques, Simon James) is an outstanding group of traditional and contemporary musical talent from Portuguese guitar (Angelo Freire) to samba-inspired drums (Vicky Marques).

Maybe only the daughter of an European father and an African mother, growing up in the not-so-subtly racist former colonial capital, could have given its music back to a culture still paralyzed by guilty denial.

Sunday, April 26, 2009

Falling for the magic formula

Conditional entropy and the Indus Script: A recent publication (Rajesh P. N. Rao, Nisha Yadav, Mayank N. Vahia, Hrishikesh Joglekar, R. Adhikari, and Iravatham Mahadevan, "Entropic Evidence for Linguistic Structure in the Indus Script", Science, Published online 23 April 2009; and the also Supporting Online Material) claims a breakthrough in understanding the nature of the symbols found in inscriptions from the Indus Valley Civilization. (Via Language Log.)

Once again, Science falls for a magic formula that purports to answer a contentious question about language: is a certain ancient symbolic system a writing system. They would not, I hope, fall for a similar hypothesis in biology. They would, I hope, be skeptical that a single formula could account for the complex interactions within an evolutionarily built system. But somehow it escapes them that language is such a system, as are other culturally constructed symbol systems, carrying lots of specific information that cannot be captured by a single statistic.

The main failing of the paper under discussion seems to be choosing an artificially unlikely null hypothesis. This is a common problem in statistical modeling of sequences, for example genomic sequences. It is hard to construct realistic null hypotheses that capture as much as possible of the statistics of the underlying process except for what the hypothesis under test is supposedly responsible for. In the case at hand: what statistics could distinguish a writing system from other symbol systems? As other authors convincingly argued, first-order sequence statistics miss critical statistical properties of writing systems with respect to symbol repetition within a text, namely that certain symbols co-occur often (but mostly not adjacently) within a text because they represent common phonological or grammatical elements. Conversely, given first-order statistics are compatible with a very large class of first-order Markov processes, almost all of which could not be claimed to be anywhere close to writing of human languages. In other words, most of the “languages” that the paper's test separate from their artificial straw man are nowhere close to any possible human language.

To paraphrase Brad De Long's long-running lament: why oh why can't we have better scientific journal editors?

Update: Mark Liberman, Cosma Shalizi, and Richard Sproat actually ran the numbers. It turns out that they were able to generate curves similar to the paper's “linguistic” curves with memoryless sources. Ouch. As a bonus, we get R, Matlab, and Python scripts to play with, in an ecumenical demonstration of fast statistical modeling for those following at home.

Tuesday, April 14, 2009

Strings are not Meanings Part 2.1

Strings are not Meanings Part 2.1: Fernando is right – these observations are powerful traces of how writers and readers organize and relate pieces of information. Just as a film of Kasparov is a trace of his playing chess.

I think I didn't make my point as strongly or precisely as I should have. The bubble chamber analogy is neat, but limited. In contrast to the traces in the chamber, the stuff out there on the internet, stored or in transit, is not just a record but also a huge external memory that is as causally central to our behavior as anything in our neural matter. The question then is, what's the actual division of labor between external and mental representation. I tend to believe that material and communicative culture carry a lot more of the burden than individual minds, similarly to how much more of the informational burden of current computing is carried by programs stored out there than by CPUs.

I think that Fernando approaches this space from a more behaviourist mindset – accepting the input, output and context but with no requirements for stuff happening ‘inside’.

No, my stance is definitely not behavioristic. There's lots of complexity ‘inside.’ But the patterns of representation and inference favored by symbolic AI have little to do with ‘inside’ as far as I can see. Instead, they are formalizations of language — as formal logic is —which explain little and oversimplify a lot. Given that, we might as well go right to the language stuff out there and drop the crippled intermediaries.

In addition to their taxonomic meaning, ontologies have come to refer to a requirement for communication – that the stuff I refer to maps to the same stuff for you.

But that's where it all falls apart. No formal system can ensure that kind of agreement. There is no rigid designation in nature. Our agreements about terms are practical, contextual, contingent. Language structure relates to common patterns of inference (for instance, monotonicity) that seem cognitively “easy” (whether they have innate “hardware” support I don't know). But asserting that is much less than postulating a whole fine-grained cognitive architecture of representations and inference algorithms out of thin air, when the alternative of computing directly with the external representations and usage events of language is available to us and so much richer than even the fanciest semantic net system.

(Via Data Mining.)

Sunday, April 12, 2009

Strings and Meanings

I'm reading Alva Nöe's so far (page 90) delightful Out of Our Heads. He makes much more concisely a point I tried to make earlier, which goes back to Hilary Putnam:

I am not myself, individually, responsible for making my words meaningful. They have their meanings thanks to the existence of a social practice in which I am allowed to participate.

This is the same whether the words are in my speech, in my writing, or strings in some data structure, maybe an ontology. Ontologies do not have magical powers. Their value is in their practical communicative success, as is the value of any other means of communication.

Saturday, April 11, 2009

Strings are not Meanings Part 2

Strings are not Meanings Part 2: Matt refines his earlier points:

Data may be unreasonably effective, but effective at what?

In asking this, I was really drawing attention to firstly the ability for large volumes of data (and not much else) to deliver interesting and useful results, but its inability to tell us how humans produce and interpret this data. One of the original motivations for AI was not simply to create machines that play chess better than people, but to actually understand how people’s minds work.

The data we were discussing in the original paper tells us a lot about how people “produce and interpret” it. Links, clicks, and markup, together with proximity, syntactic relations, and textual similarities and parallelisms, are powerful traces of how writers and readers organize and relate pieces of information to each other and to the environments in which they live. As David Lewis once said, just as the Web was emerging, they form a bubble chamber that records the traces of much of human activity and cognition. Like with a bubble chamber, it is noisy, it requires serious computation to interpret, and most important of all, it requires prior hypotheses about what we are looking at to organize those computations. How much those hypotheses depend on fine-grained models of “how people's minds work,” we really have no idea. If we were to measure the success of AI for its progress on creating such models, we'd have to see AI as a dismal, misguided failure. AI's successes, such as they are, are not about human minds, but about computational systems that behave in a more adaptive way in complex environments, including those involving human communication. Indeed, neither AI researchers nor psychologists, nor linguists, nor neuroscientists have made much progress (not since I came into AI 30 years ago, anyway) in figuring out the division of labor between task-specific cognitive mechanisms and representations and more shallow, statistical, neural and social learning systems in enabling human cognition and communication. If anything, we have increasing reason to be humble about the alleged uniquely fine-grained nature of human cognition, as opposed to the broader, shallower power of a few inference-from-experience hacks, social interaction, and external situated memory (territory marking, as it were), not just in humans, to construct complex symbolic processing systems: Language as Shaped by the Brain, Out of our Heads, The Origins of Meaning, Cultural Transmission and the Evolution of Human Behavior, ....

Despite all the ontology nay-sayers, a big chunk of our world is structured due to the well organized, systematic and predictable ways in which industry, society and even biology creates stuff.

Here, I want to draw attention to the skepticism around ontologies. Yes, they come at a cost, but it is also the case that they offer true grounding of interpretations of textual data. Let me give an example. The Lord of the Rings is a string used to refer to a book (in three parts) a sequence of films, various video games, board games, and so on. The ambiguity of the phrase requires a plurality of interpretations available to it. This is a 1-many mapping. The 1 is a string, but what is the type of the many? I actually see the type of work described in the paper as being wholly complimentary with categorical knowledge structures.

Hey, you gave your own answer! The many are "a book", "a sequence of films", "[a] video game", ... Sure, the effect of the (re)presentation of those strings in certain media (including our neural media) in certain circumstances has causal connections to action involving various physical and social situations, such as that of buying an actual, physical book from a book seller. But that causal aspect of meaning — which I contend is primary — is totally ignored by ontologies. Ontologies may pretend to be somehow more fundamental than mere text, but they are just yet another form of external memory, like all the others we already use, whose value will be determined by practical, socially-based activity, and not by somehow being magically imbued with “true grounding.” Grounding is not true or false, it's the side effect of causally-driven mental and social learning and construction. What a symbol means is what it does to us and what we do with it, not some essence of the symbol somehow acquired by having it sit in a particular formal representation. No one has provided any evidence that by having "Harry Potter" sit somewhere in WordNet, the string becomes more meaningful than what we can glean from its manifold occurrences out there. It may be more useful to summarize the symbol's associations in a data structure for further processing, I'm all for useful data structures, but they don't add anything informationally (it may add something in computational efficiency, of course), and it often loses a lot, because context of use gets washed out or lost. Let's be serious — and a bit humbler — about what we are really doing with these symbolic representations: engineering — which is cool, don't worry — not philosophy or cognitive science. Much of this was already said or implied in McDermott's classic (unfortunately I can find it only behind the ACM paywall, so much for “Advancing Computing as a Science and a Profession,” but I digress...), which we'd do well to (re)read annually on the occasion of every AAAI conference, and whenever semantic delusions strike us. (Via Data Mining.)

Update: Partha (thanks!) found this freely accessible copy of Artificial Intelligence Meets Natural Stupidity.

Saturday, March 28, 2009


Very different coffee history and preferences from mine (double espressos, occasional macchiato), but the feelings and expression are perfect.

Strings are not Meanings

Strings are not Meanings (edited since original posting): I just linked to a favorable review of our position paper The Unreasonable Effectiveness of Data, it's only fair that I also link to a (brief) skeptical review. I agree with Matt's title, strings are not meanings. But neither are any other objects, and that's where I think we seriously disagree. More on that as I respond to his three cautions.

Data may be unreasonably effective, but effective at what?

I think our paper gave enough examples of the effectiveness we had in mind, but I'll stick my neck out further here. Effective at capturing the relations that underlie meaning in language use.

Despite all the ontology nay sayers, a big chunk of our world is structured due to the well organized, systematic and predictable ways in which industry, society and even biology creates stuff.

Sure the world is structured. But well before taxonomic technologies were invented with writing and spatial indexing of information (see Everything is Miscellaneous), primates including Homo sapiens were pretty well along in figuring out how to exploit that structure (see Baboon Metaphysics, The Origins of Meaning). Taxonomic technology is no more inevitable or everlasting than water or steam power.

Data with no theory is all very well, but reasoning cannot be done without a world of semantic objects.

We did not write about “data with no theory.” That's a straw man that unfortunately often substitutes for original thought whenever these issues come up, as two of us had to note previously, and others did too. As for “a world of semantic objects,” what on earth could that be? Meaning is about relations among states: the state of the computer screen when you read this, the state of my brain when I wrote it, and the state of affairs described by my writing; the state of my brain when I'm writing it, the physical state of some paper and ink involved in my reading Situations and Attitudes a couple of decades ago, and the state of affairs of semantic debate between then and now; and so on. There are no semantic objects, only semantic relations, semantic by virtue of the causal connections among the related states. Jon Barwise, who I had the privilege of discussing these matters with, is sadly no longer with us, but a good sit down with, say, Information Flow, would do wonders for one's semantic hygiene. (Via Data Mining.)

Data in its untamed abundance gives rise to meaning

Joho the Blog » Data in its untamed abundance gives rise to meaning: Seb Schmoller points to a terrific article by Google’s Alon Halevy, Peter Norvig, and Fernando Pereira about two ways to get meaning out of information. Their example is machine translation of natural language where there is so much translated material available for computers to learn from, which (they argue) works better than trying to learn from attempts that go up a level of abstraction and try to categorize and conceptualize the language. Scale wins. (Via Joho the Blog.)

Thanks to David Weinberger for the nice review! I love the poetic post title Data in its untamed abundance gives rise to meaning.

Just this Friday, Tom Mitchell gave a great talk at Google on his group's latest results on decoding the concepts someone is thinking about from their fMRIs. Crucially, the decoding relies on the statistics of associations between concepts expressed by nouns and surrounding action and perception verbs, thus translating between text associations and statistical correlations between activity in different brain areas. Sure, the usual suspects will again tell us that's nothing to do with “real” meaning, just mere associations of flickering bits in our servers and our neurons. Thus “real” meaning echoes the vital force, the flogiston, and the ether before it, “true essences” all.

Last weekend in Tahoe

Route 89 NCongested I-80 West

My blogging backlog is getting worse... Last weekend there was a fast-moving storm that dropped over one foot of fairly dry, creamy powder around Tahoe. I skied Squaw Saturday — decent spring conditions — and Sunday — my best inbounds runs of the season. Too busy skiing to take pictures, but I wanted to capture the beautiful late afternoon light on the snow driving back from Squaw to the Bay Area.

Thanks to:

  • Rick and David from showing me around Squaw, which I don't know so well, on Saturday.
  • The fast-moving Alaskan low that decorated KT-22 with sweet powder, and the following winds that kept refilling East Bowl.
  • Karhu and Dinafit for a powder-skiing setup that works beautifully.
  • The hitch-hikers I picked at the 7-11 by the Backcountry for having the brilliant idea of calling the number of my lost phone Sunday afternoon in Truckee.
  • Truckee Airport fire station, in particular fireman Adam, for finding my phone fallen on the Wild Cherries parking lot and keeping it safe. I owe you the best ice cream I can find.

No thanks to:

  • Out-of-control snowboarder who it me at high speed on the beginner Sunnyside run Saturday, breaking one of my ski poles and leaving a black-and-blue swelling on my right hip, and refused to accept responsibility until I suggested we talk to the ski patrol.
  • The usual clueless drivers on I-80 West who make the traffic worse for everybody with their lane changes and tailgating.