Matt Wedel is constantly telling me I need to read Frank Herbert’s classic sci-fi epic Dune. I’ve never been keen because of the vast number of sequels, but I finally gave in to his repeated requests and started on it last night, on my Kindle.
I got as far as page 4. Since the Kindle shows small pages, I guess that’s part way down page 2 of a printed copy. Here’s why:
Yes, Paul. What is a gom jabbar?
First of all, welcome back to programming! It was a bit of a shock for me to see that, since the last programming article, I’ve written a sequence of four posts about Doctor Who and Buffy, which is an imbalance I never intended. Well, Doctor Who will end after two more episodes, so hang in there, programmers!
The response to We can’t afford to write safe software was very interesting. As usual (I am pleased and proud to say) the comments here on The Reinvigorated Programmer article itself were, almost without exception, insightful and informative. But over on Reddit things were not so good.
Let’s start by thinking about a very simple example. I’ve recently switched to using Ruby as my language of choice, after a decade as a Perl hacker. Ruby does a lot of things more nicely than Perl, including having proper object syntax, simple everything-is-an-object semantics, a sweet module/mixin scheme and very easy-to-use closures, so I’ve mostly been very happy with the switch. In so far as Ruby is a better Perl, I don’t see myself ever writing a new program in Perl again, except where commercial considerations demand it.
Having dealt (briefly and informally) with invariants and bound functions in the two previous installments of this series, it’s time to look at preconditions and postconditions. These will likely be familiar to most of you, even those who’d not heard of invariants before, because these days they are known by the trendy, XP-friendly name design by contract. (OK, it’s not quite the same thing; but they’re closely related concepts.)
Following on from yesterday’s article on invariants, this time we’ll talk about a second tool in the kit for thinking about code. This one will be much shorter, because the Concept Of The Day is so simple. It gets talked about even less than invariants, but let’s change that!
(Oh, and today is vegetarian-friendly Cheese Day at The Reinvigorated Programmer, though it still may not be much use to the vegans among you.)
I’m struggling with my article numbering now :-)
Regular readers will have noticed that I am running a whole bunch of series in parallel on this blog: one reviewing programming books, one on the issue of simplicity in programming, one reviewing Series 5 of Doctor Who, a related one on Russell T. Davies’s contributions, and one on learning Lisp. To my huge surprise, what started out as a review of Jon Bentley’s book Programming Pearls [amazon.com, amazon.co.uk] has led through a sequence of digressions and become a series of its own on binary search. This is part 4 of that series: when I’ve finished part 4, I will — at last! — progress to the actual review. But since part 4 is about the distinctly non-trivial subject of writing correct code, it will itself have to be composed of multiple sub-parts. So this article on invariants is both part 1 of the new writing-correct-code series and part 4a of the binary-search series. Hope that’s clear.
A few people have complained about the sushi pictures on this blog, so before I plough into the article proper, here is something completely different.
The contributions to the original binary search thread continue to trickle in, and now stand at an astonishing tally of 679 comments, most of them containing code. Thanks once more to all of you who’ve attempted this challenge, and to everyone who’s commented.
In one of the more interesting comments, Darius Bacon tested a sample of 20 of the Python submissions, and found that exactly 10% of them both passed the functional test and appeared to run in O(log n) time — which of course is exactly in line with Jon Bentley’s original statistic, that 10% of the professional programmers he’d examined were able to write correct binary search code (albeit under rather different conditions). It subsequently became apparent, though, that Darius’s testing code was overly strict, and that a further two of the sampled routines did run correctly (though not necessarily in O(log n) time). Still — it’s a surprisingly low hit-rate, especially when you bear in mind that, contrary to the rules, many of the programmers did use testing to help them develop the code. (You should read the second linked comment, as it has plenty of other interesting observations, too.)
[Update, an hour or two later: as Darius Bacon points out in a comment below, I misinterpreted his results. Once the testing code had been fixed, his sample of 21 routines found 9 that passed all test, of which 6 were O(log n).]
Many, many thanks to all of you who have contributed to the astonishing response to the previous article on the difficulty of getting binary search right. I’ve never seen anything like this: in just over 24 hours since I posted the article, 541 comments have been posted, and they’re still rolling in. (That number is a bit inflated by people re-posting the same code-samples multiple times after seeing WordPress mangle their source, but even taking that into account it still amazes me.)
You guys rule. Seriously.
I have a lot to say in response to it all. Read on …
A bit of personal history, as background, before we jump in …
In late 2009, I got it into my head that there was a web-site I wanted to build. That web-site would be backed by a database, and would allow records in the database to be created, edited, listed, viewed, deleted and linked together in various ways. (I won’t talk about the details now, because I want to save them for when I can link to the running site, which doesn’t exist yet.)
I’ve built a lot of conceptually similar sites in my time, mostly using Perl with HTML::Mason to embed mod_perl code fragments in HTML templates (kind of like the way PHP works). This has entailed having some kind of layer between the application and the database to enable objects to be stored — what we now call an ORM layer, although I didn’t know that term back when I first wrote one in (probably) 2001.
Today, I want to talk about a problem that I don’t have a good solution for, and throw the floor open in the hope that someone else does. Teach me, O commenters.
Suppose your program has to build a tree structure — as the result of parsing a query, say. And suppose that, having built the tree, you want to do several different operations that involve walking that tree. How do you design that program?
To make it concrete, consider my (rather aged) package CQL-Java, which — get ready for a big surprise — provides a CQL parser in Java. CQL is a conceptually simple but very precise and expressive query language used in information retrieval, but for our current purposes all we need to know is that it supports structured boolean queries like these:
kernighan and ritchie
(kernighan and ritchie) or fowler
kernighan and (ritchie or pike)
Our task is to parse such queries, and to be able to render them out either in an XML format called XCQL or in Index Data’s ugly but functional Prefix Query Format