Monthly Archives: September 2010

Steve Jobs “never had any designs. He has not designed a single project”

Back when I reviewed Peter Siebel’s fascinating book of programmer interviews, Coders at Work, Erik Anderson suggested in a comment that I might also enjoy its precursor Programmers at Work [,].  I bought and read it, and it’s excellent.  I’ll review it properly some time soon — but today I just wanted to draw attention to one segment that caught me completely off guard.

Continue reading

Bibliographic data, part 3: Has anyone, anywhere, ever read the whole of the RDA specification?

[This article concludes what’s turned out to be a three-part series.  You may wish to read part 1 and part 2 before this one.]

I only meant to write two articles on the difficulty of representing a journal article reference in a standard XML format.  But an epilogue is warranted because, well, surely there has to be a standard way to do this.

Well, let’s step back a bit from the detail of XML representation.  Let’s just look at cataloguing rules.

Continue reading

Bibliographic data, part 2: Dublin Core’s dirty little secret

[This is part two in a series — you should read part 1 first for context and then you might go on to part 3.]

The Dublin Core — metadata made dumb

Just when librarians were in despair of ever getting their data out to the world in a form it could understand, along came the Dublin Core (DC for short) — a simple set of fifteen metadata elements (contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, and type) that could be used to describe “document-like objects” such as books, journal articles and web pages.

Everyone in the library world got really excited about the Dublin Core for about three weeks in 1999, before realising that you can’t actually do anything with those elements beyond expressing author (called “creator“), title and date. Everything else was too vague to be of any use — coverage, anyone? Relation? Format?

Continue reading

Bibliographic data, part 1: MARC and its vile progeny

[This is part one of a three-part series.  When you’re done here, read on to part 2 and part 3.]

My job is the subfield of programming that relates to searching, retrieval and metadata, especially as it relates to libraries. That means that what I deal with is mostly bibliographic metadata — sets of fields that describe book or journal articles. For example, the federated search system that we provide, while not in any way limited to searching for and presenting results of this kind, has tended to be used primary in the library domain, so I spend a lot of my time dealing with bibliographic data.

It’s a jungle out there. The dominant electronic format for bibliographic information is, still, by far, the ancient and faintly comical MARC (MAchine Readable Catalog) format, or rather, the MARC family of similar but subtly incompatible formats. MARC originated in the 1960s at the Library of Congress, literally as a way to encode the information on physical catalogue cards.

Continue reading