Diversity in scientific data

How important is WolframAlpha?: I don’t know those areas well enough to give an example that will hold up, but I can imagine WA becoming the first place geneticists go when they have a question about a gene sequence or chemists who want to know about a molecule. (Via Joho the Blog.)

These are the worst possible examples. I've worked quite a bit with biologists and medical researchers, and the last thing they want is a single source for their research data. Genomic sequences or the 3-D structures of complex molecules are works in progress, with many sources with different strengths and weaknesses. Two of my recent bioinformatics papers are on how you can get better genomic annotation by combining multiple sources of evidence developed by different researchers with different methods. Much of the current progress on genomics, proteomics, and systems biology is about different approaches to annotation and information integration, and advances from comparing and combining different types of information.

Highly curated, single-source data is useful only in those areas where how the data is collected and curated is not a central part of the scientific debate. I can't think of a single area of science that I follow in which the core data are settled, from biology to linguistics. Diverse sources, openly exchanged, contrasted, and combined, are the lifeblood of data-driven science.

