First of all, welcome back to programming! It was a bit of a shock for me to see that, since the last programming article, I’ve written a sequence of four posts about Doctor Who and Buffy, which is an imbalance I never intended. Well, Doctor Who will end after two more episodes, so hang in there, programmers!
The response to We can’t afford to write safe software was very interesting. As usual (I am pleased and proud to say) the comments here on The Reinvigorated Programmer article itself were, almost without exception, insightful and informative. But over on Reddit things were not so good.
Consider this comment by StoneCypher:
“We can’t afford to write safe software”
Then you can’t afford to write software, and your customers certainly can’t afford your low quality of software.
I’ve found this kind of absolutism much more common on Reddit than here, with Hacker News somewhere in between. It’s a bit disturbing given that only a Sith thinks in absolutes (to which Anakin should have replied: “Are you absolutely sure?”). Sometimes in my more cynical moments, I wonder what proportion of Reddit comments are written by people who have actually read the article.
Anyway, this comment did have the merit that it set me thinking. How much safety do we in fact need? And, more importantly, what do we have to sacrifice in order to achieve it? And can we afford those sacrifices?
Of course it’s nice to have zero bugs. But we all know that in reality, (A) it’s not going to happen, and (B) it’s not truly our main concern. I quote from Jon Bentley’s book More Programming Pearls: Confessions of a Coder [amazon.com, amazon.co.uk], page 67:
I once stated to Bill Wulf of Tartan Laboratories that “if a program doesn’t work, it doesn’t matter how fast it runs” as an undebatable fact. He raised the example of a document formatter that we both used. Although the program was significantly faster than its predecessor, it could sometimes seem excruciatingly slow: it took several hours to compile a book. Wulf won our verbal battle with this argument: “Like all large systems, that program today has ten documented, but minor, bugs. Next month, it will have ten different small, known bugs. If you could magically either remove the ten current bugs or speed up the program by a factor of ten, which would you pick?”
In these enlightened days, of course, typesetting speed is not really an issue: 95% of my non-plain-text writing is done either on OpenOffice or the WordPress editor, both of which essentially do typesetting in real time. But the principle is still good: for “document formatter”, read “web browser”; for “several hours to compile a book”, substitute “several seconds to display a page”, and for “ten bugs” read “14303 bugs“. In practice, speed is often more important than correctness.
And of course we don’t really need Jon Bentley’s anecdotes to prove this: all of us who are programmers make the correctness-vs-speed-vs-functionality call constantly: at any given moment, I can work on eliminating bugs from my code, or improving its speed, or adding new functionality. If StoneCypher is right then we should always, as a no-brainer, pick the first of these, only working on performance and new functionality when all known bugs have been eliminated; but we all know that in practice we spend more time on adding new functionality than on bug-fixing.
Needless to say, the trade-off point is impossible to determine algorithmically, and must be determined by taste, experience, judgement and often by commercial pressures. The appropriate choice of when to work on correctness and when to work on performance or functionality is also different depending on the application domain. It hardly needs saying that avionics systems need to be correct, always: they merit the use of formal methods in writing and proving the code, as well as batteries of tests to improve confidence in its correctness. Partly that’s because the consequences of bugs are so severe; it may also be because the functional requirements in that domain are well demarcated and understood, so there is relatively little pressure to invest time in adding new functionality. By contrast, there seems to be a silent agreement in the world of web browsers that bugs are OK, really — even dramatic ones that crash the browser — and that what we all really want is more new features.
(I remember back when Netscape was still standard, a colleague trying to sell me on the then new pre-1.0 Mozilla by telling me that its crash-recovery was excellent. It didn’t strike me as a good omen when a program’s best feature is its crash-recovery but, what do you know: here we are in 2010, and crash recovery is still important in browsers. Recently, Google Chrome went through a series of updates that left it crashing on me maybe once a day for a fortnight or so; I stuck with it anyway, rather than reverting to the much more stable Firefox, simply because it’s twice as fast.)
So, OK — we all seem to more or less agree that avionics software needs to be correct; but web browsers need to be fast and featureful, and if bugginess is the price we have to pay, then so be it. But what about software in the middle? What kinds of programs fall on which side of the line?
You’d have thought that operating systems would fall firmly on the must-be-correct side of the line; but the generation that’s grown up with Windows versions that need rebooting several times a day, and which considers an O/S reinstall to be a fairly routine procedure, seems to have been taught that it ain’t so. Maybe I should have said “brainwashed” instead of “taught”. To me, that state of affairs a travesty, but I guess the world voted with its pocket.
Anyway, I hope I’ve gone some way towards convincing you that “all bugs are unacceptable” is unrealistic fundamentalism. Next time, we’ll look at some back-of-the-envelope calculations that can help us to make sensible decisions on where to invest time, and which bugs we can and should ignore.