Testing is not a substitute for thinking (binary search part 3)

The contributions to the original binary search thread continue to trickle in, and now stand at an astonishing tally of 679 comments, most of them containing code.  Thanks once more to all of you who’ve attempted this challenge, and to everyone who’s commented.

In one of the more interesting comments, Darius Bacon tested a sample of 20 of the Python submissions, and found that exactly 10% of them both passed the functional test and appeared to run in O(log n) time — which of course is exactly in line with Jon Bentley’s original statistic, that 10% of the professional programmers he’d examined were able to write correct binary search code (albeit under rather different conditions).  It subsequently became apparent, though, that Darius’s testing code was overly strict, and that a further two of the sampled routines did run correctly (though not necessarily in O(log n) time).  Still — it’s a surprisingly low hit-rate, especially when you bear in mind that, contrary to the rules, many of the programmers did use testing to help them develop the code. (You should read the second linked comment, as it has plenty of other interesting observations, too.)

[Update, an hour or two later: as Darius Bacon points out in a comment below, I misinterpreted his results.  Once the testing code had been fixed, his sample of 21 routines found 9 that passed all test, of which 6 were O(log n).]

“It’s dumb not to test”

By far the most common comment on this challenge was that it’s dumb to try to write correct code without testing.  Many, many people made this point, not only here but especially on Reddit.  For example, someone calling him/herself K wrote: ” I don’t really see a point in this exercise – are you (or the author of the book) implying that if a programmer can’t implement this on first try, without testing, he’s not as good as one who can?  IMO, it’s not a good metric for the ability of the programmer.”

First of all, let me be very clear what I wasn’t saying: I was not saying that it’s a bad thing to test your code.  I am very much in favour of testing.  So is Jon Bentley.  That’s because neither of us is insane.

The point is this: testing is only one of the weapons in our armoury, and it’s one that has recently become so fashionable that a lot of programmers are in danger of forgetting the others.  Although testing is valuable, even indispensible, it is also limited in important ways, and we need to have facility with other techniques as well.

Here are three limitations of testing:

  • Tests can only show the presence of bugs, not their absence
  • Tests can be buggy, just like tested code
  • Tests increase confidence, not understanding

I’ll expand on each of these below.  Before I do that, let me say one more time that testing is good.  Please do not respond in the comments saying “Mike is a doofus because he says testing is bad.”  Testing is very good.  But like so many very good things — Doctor Who, sushi, Wi-Fi, Gold Miner ale — it turns out that it’s not the only very good thing you need.

So the purpose of the write-your-code-without testing exercise was not to discard testing because it’s useless; but, precisely because testing is useful, it’s important that we sometimes force ourselves to walk without that particular crutch … and perhaps we’ll find out how atrophied our muscles have become.  Then we can discover what other areas we need to concentrate on, what techniques we need to bone up on, what tools we need to relearn and what exercises we need to do.

(I’m assuming here that we aspire to be excellent, rather than merely adequate, programmers.)

Tests can only show the presence of bugs, not their absence

I wrote this heading off the top of my head, then looked at it and thought it sounded a bit familiar.  When I grepped around, I found that it was, almost word for word, a quote from an old friend:

Testing can show the presence of bugs, but not their absence.
— Edsgar W. Dijkstra, University of Texas

So I guess I am in good company in making this point.  To say the same thing in more general terms, as Tom Holtz is fond of saying, “Absence of evidence is not evidence of absence”.  (He’s usually talking about fossils in a particular rock formation, but the principle is good more widely.  And to be fair, absence of evidence is indeed evidence of absence, but it’s not proof of absence.)

You can test all you want; but all you have shown at any given point is that you have not yet found any bugs.  We can see this principle at work in several of the posted solutions — I don’t want to pick on individuals, so I won’t name names, but I’ve seen submissions tagged with statements like “Mine works”, “works for me” and “no errors at the first try”, which have been subsequently shown to be buggy — presumably because the author’s own tests didn’t test all the edge cases.

Again, this is not to say that testing is useless: each passed test can legitimately increase confidence in the tested code.  But it can never get you all the way.  For that, you need something different.

Tests can be buggy, just like tested code

“Quis custodiet ipsos custodes?”, asks Juvenal (and let us not forget that quidquid latine dictum sit, altum viditur).  “Who will watch the watchmen?” replies Alan Moore.  “Who will test the tests?”, say I.

This is another problem that cropped up several times in the comments.  For example:

  • “My code worked the first time, but my test had a bug.” — nicholas.
  • “Worked at first try though I did test it before submitting here, which took some time before I realized that while the binary search was working, my small test code was faulty.” — Robert Leffmann.
  • “I felt my heart sink when it threw an exception on first run, but it turned out the error was forgetting to sort the test arrays in the test code, so all was ok.” — Paddy.
  • “Had some bugs in the harness itself, but didn’t catch any in the binary search method.” — Dan Carroll.

We’ve all been there, I’m sure.  As it happen, this very afternoon I spent an unpleasant hour or so trying to debug the test suite of a Perl module that I was building a Debian package for.  We fall too easily into talking as though code and tests are two different things, but tests are code.  We know that we’re not clever enough to write correct code all the time, so why would we think that we can reliably write correct tests?

Actually, there are two reason why this argument isn’t as strong as it sounds.  One is that the code that makes up test suites is often — not always, but often — much simpler than the code being tested.  In which case it’s more likely to be correct.  I’ve been in situations where that’s not true, where the testing code is itself a major piece of work; but I admit those cases are the exception.  The second reason why this argument is not super-strong is this: writing tests lets you triangulate.  You’re looking at the same problem from two different directions, and if your tests are broken there’s a fair chance that your correct code will help you spot that fact and fix it.  (If you’re really unlucky, your code and your test will both be broken but in such a way that the deficiencies of one patch over the mistakes in the other; but that seems to be rare.)

I’m not going to labour the tests-can-be-wrong point too much, just as I skimmed quickly over tests-can’t-show-the-absence-of-bugs, because I want to get on to the meat: the real reason why testing is in desperate danger of being overhyped.  These two points have just been a prelude; now on to the fugue.

Tests increase confidence, not understanding

Here we reach my core point.  It’s this.  If you had to boil down the whole programming skill-set to a single core skill — a completely general one that applies across all that we do — what would you choose?  I can imagine various different answers, but for me the answer would be: thinking clearly.  I’d argue that the essence of good programming is thinking clearly about the problem; understanding; grokking.

Testing a piece of code is essentially a black-box process.  If you hack together a binary search routine, you can and will write tests for it that are not based on how the code works, but only on what it claims to do.  This is right and good — it’s proper separation of concerns.  But what it means is that writing and running tests does not in itself increase your understanding of the code being tested.

And we need to understand.

In the case of binary search (and let’s all remember that this was chosen because it’s a particularly straightforward algorithm), we’ve seen posted code that does all sorts of horrible things — some of them kind of harmless; some of them with performance implications (like all those array-slicing Python implementations that run in O(n) time, and so are no better than a trivial linear search); and some positively detrimental.  The point is this: no amount of testing will, in itself, give you any insight into these coding mistakes.

(Aaaaand I am going to say it yet again, in the desperate hope of not being misunderstood: this does not mean that I think testing is bad; it means that I think it serves some purposes but not all.)

If you’re not convinced, consider this pair of comments.  From Jonathan Deutsch: “Writing code without testing is akin to doing math without a calculator”; and from Matthias Goergens: “Arithmetic without a calculator borders on nonsense.  But for analysis, algebra or proofs in general, you’d need a very advanced calculator.”  Unit tests are great for verifying your arithmetic; but they don’t help you to think through your calculus.  For that, you need to think.  There’s no way out of it.  Sometimes, no amount of agile, pattern-oriented, test-driven, refactoring-focussed pair-programming methodology will get you out of the responsibility to invest some actual thought into your program.

It’s not all just Move Method and Pull Up Instance Variable, you know.  Sometimes I wonder whether tools that can do mechanical things reliably for us are blunting our ability to think clearly about subtle problems.  One of my very favourite comments on the original post was EoghanM’s: “I definitely thought about it a lot harder when I couldn’t test.”

“Test-driven development” is is not even a zinc bullet

Fred Brooks, perhaps best known as the author of The Mythical Man-Month [amazon.com, amazon.co.uk], is also the author of the classic 1986 paper, No Silver Bullet, in which he argued that “there is no single development, in either technology or management technique, which by itself promises even one order of magnitude improvement within a decade in productivity, in reliability, in simplicity”. Admittedly, what he says is impossible is a high goal — an order of magnitude — and I doubt that even the most committed Object Jockey, Extreme Programmer, Refactoring Guru or Test-Driven Developer would claim benefits that great for their favoured methodology.  And yet, these fads keep coming along, and Test-Driven Development seems to be the fashionable one right now.

Folks, really?

Well, OK, maybe a bit.  Here’s where I think the cognitive mismatch is: a test-driven approach can be helpful in designing (and of course testing) an API; but it makes no contribution whatsoever to designing an algorithm.  In other words, it might help with the easy bit, but doesn’t help at all with the actual work.  But APIs are such a big deal nowadays (especially with the framework obsession and all), that we too easily fall into thinking that the API is the program.  (Over-Fowlering the code can also contribute to this misapprehension.)  That is a mistake on the same order as “the medium is the message“.  The medium is not the the message, it’s what we need to get the message; the API is not the code, it’s how we invoke the code.  But, oh my, isn’t it seductive?  To spend our time breaking a problem down into smaller and smaller parts, designing lots of little classes, writing masses of test code, and telling ourselves that we’re being productive, without ever putting any effort in solving the actual problem.

The classic example of this is Peter Norvig’s and Ron Jeffries’ sudoku solvers.  These are gathered and linked from the brief article Learning From Sudoku Solvers, which has some excellent comments.  I urge you to go and look at these for yourself when you’re done here, but in a nutshell: Ron Jeffries wrote a series of five articles on the test-driven development of a sudoku solver that eventually faded away into nothingness without ever finding a solution, or even coming close; while Peter Norvig spent a while thinking hard, applied powerful non-trivial techniques, and wrote a 100-line Python program that solves any sudoku puzzle.  [Jeffries’ articles are often unavailable, as his web-site comes and goes.  If you can’t read them there, you can find them in Google’s cache.]

How did Norvig do that?  By using skills that are not often thought about in these days of test-driven kool-ade, reflexive application of design patterns and automatic refactoring tools.  Once more, let me be clear that all these things are good so far as they go — really, I agree with you, they are! — but they are no substitute for actually thinking, and thinking is what’s needed for hard problems.

I was going to say that in the last part of this article I’ll look at a couple of mental tools for thinking about code, but yet again I have gone on for much longer than I intended, so that will have to wait until a subsequent article.  Soon, though, I promise!  (And yes, I am well aware of the irony that this is one of a series of blog entries wherein I keep promising to get to the meat but never do, and that here in this very entry I castigate Ron Jeffries for a series of blog entries wherein he kept promising to get to the meat but never did.  The difference is that we’ve been waiting three years for RJ to keep his promise but only three days for me to keep mine.  Also, I know what my punchline is going to be.)

Before I wrap up for this time, I want to address one more point:

“But I would have had a working binary searcher more quickly if I’d been allowed to test along the way”

Yes, maybe you would.  Bearing in mind that binary search is in fact a pretty simple algorithm, it’s likely true that you could have bashed your way through to a working implementation after a few iterations of coding and testing.  But —

  • Your code would be less clear
  • You wouldn’t understand clearly why it works
  • You would find it harder to extend or modify in future

Because as soon as it’s aimed at a non-trivial problem, test-driven development becomes debugging-driven development.  It stops being “write the test, then write the code, then check that your test says the code is OK”.  It starts being “write the test, then write the first draft of the code, the run the test to find the first bug, then patch the code, then write the code to find the next bug; GOTO 40”.

Let’s look at a hypothetical example of how this can happen.  Tell me if this sounds hauntingly familiar.

I bang out a first draft of my binary search routine, and test it.  I notice that it sometimes looks at an out-of-range array element and realise that I was setting the upper bound to the size of the array rather than the index of its last element.  I fix that and see that sometimes it goes into an infinite loop.  I guess that’s because when I iterate, I am setting the new upper bound one too high so that it doesn’t converge, so I try subtracting 1.  Now I find it sometimes fails to find values that are included, so I remove the -1 and try setting the lower bound one higher instead.  It seems to work.  The next test shows that my code doesn’t handle an empty array, so I add a special case for that.  Then I find it doesn’t work on an array of a single element, so I add another special case.  A few more tests suggest that I’ve caught all the edge cases, so I pronounce myself done.

I’m sure I don’t need to (A) persuade you that the resulting code is suboptimal, (B) point out that it likely still contains bugs, or (C) draw your attention to the fact that more than a few of the submitted solutions a couple of posts ago look suspiciously like this.  My fear is that most code looks like this — a patchwork of special cases that pretty much work most of the time, more or less.  Testing helps us to have more confidence that where we’ve arrived is good enough; but it doesn’t help to make the code good.  And that is what I want to talk about next time.


Usually, I gather my sushi photographs from many different sources.  But for this article, I drew them all from the website of the Sushi 8 Japanese Restaurant in Mountain View, California.  They just have such nice photographs!  I have no idea whether their stuff is good or not, so don’t construe this as a recommendation, but it’s certainly easy on the eye.

Also: ignore this Technorati claim-code: EECRHMA873AV

Update: links to this whole series


56 responses to “Testing is not a substitute for thinking (binary search part 3)

  1. James Birchall

    I wonder if the points you raise are why so many of the old timers cringe at the unit testing fad? If the code is provably correct, why spend extra time writing provably correct code to lock it down? Isn’t that why you proved it correct the first time?

    Great articles BTW.

  2. Correction to the second paragraph: after Paul’s fixes, I rate 6 of the 21 sampled Python submissions as apparently both correct and O(log(n)) (not 4 of 20): http://gist.github.com/375850

    It’s understandable to goof a bit summarizing one of 679 comments! That overload was part of why I screwed up the testing, though really it was at least as much about bias: I expected failures, plus false failures would only slander other people. I was far more careful deriving my own search function. Not a nice bias to see.

  3. (And dropping O(log(n)) gets you 9 of 21.)

  4. You are dead on right. I have been trying to teach the same to people all along with no avail.

    The difference between good code and bad code, and a good developer and a bad developer, is exactly this — being able to write without testing. And this is for all the reasons that you cite.

    I started programming on a machine with no hard-drive, so I used to use a paper notebook as my hard-drive. Any crash in my code meant that I had to enter the whole code again from my notebook and also that I loose all the changes I made.

    Given such a huge penalty for run-time errors, I naturally got my mind trained to write nearly bug-free programs without even realizing. Right after coding some fifty or so lines, I look back at it and find errors just by looking. Somehow, I do not know why, I find three times more errors in printouts than reading on-screen (haven’t ever tried an e-book reader still!).

    On the other hand, the system, albeit so old not to have a hard-drive, was advanced enough not to allow enter code with syntax errors.

    One of my friends got trained in a still more stringent environment. Their professor used to make them write programs with pen and paper and penalize them heavily for all sort of errors, including syntax errors. With such training, this person could write five thousand lines of code without compiling even once and yet with a mere 20-50 “compilation” errors!!! Whoa!

    This is when I realized why “I” have so many compilation errors — My training was on a system that did not even allow entering statements with syntax errors! My mind never got trained to avoid syntax errors since the system would point them out right away! Thankfully compilation errors are much easier to fix than run-time errors.

    This by the way applies elsewhere also in addition to coding. We had an expert design an analog Integrated Circuit module for us. I could find six errors in his circuits just by looking. Three more were found by looking at the simulations. The system worked in first shot thereafter.

    PS: Why do you have those food pictures there? I saw some comment that you simply do not plan to get rid of them.

  5. I call foul, on three points:

    1) Writing code without tests isn’t like doing arithmetic without a calculator. It’s like doing arithmetic without showing your work: http://blog.duwanis.com/post/537036323/i-promised-myself-i-wouldnt-say-anything

    2) If the process of writing tests – figuring out how your code should be used and, in essence, trying to break it – *doesn’t* increase your understanding of the problem, then you’re not paying attention.

    3) Either beat up on testing, or beat up on unit testing, but don’t switch back and forth between them as though they are synonymous. :) Unit tests are not a complete solution by any means, but they’re also not the only available testing tool.

  6. I agree with what to me seems to be your general view on programming and programmers today; there’s less actual programming done and more finding the right pieces and putting them together.

    I also think there’s more of what I’d like to call “monkey knowledge” today, where you have programmers who know how to use the tools but have no idea how they’re made or how they work. Analogous to the monkey who knows to push the button on the machine to get a banana, but has no idea why it happens. Harsh perhaps, but that’s how I feel.

    Some even know very little of how the computer actually works, which I think is a disadvantage. At least to me, programs written by people without knowledge of the actual computer sometimes look strange even if they function correctly. The comparison here is the car mechanic who will be a better driver than most other people, just because he knows how the car is constructed.

    I have an image of an apocalyptic future where intact computers and hardware manuals are found, and supposed programmers are put to work but excuse themselves by saying they never knew how to program computers, just how to use all the pre-written code libraries, and that it would be futile in any case because there are no debuggers, no profilers, too few gigabytes of RAM and so forth. This time I think of the camper who brings a blow torch, altitude and oxygene meter, welding gear, geiger counter and what not for a weekend in the woods.

    Let’s hope this never happens and that programmers jog their brain once in a while and try to Write Once, Compile Once, to stay on top of their game.

  7. Duwanis: I can’t see eye to eye on you on points 1 or 2, but I admit some unclarity on point 3. Yes, I mostly talking about unit testing (or rather about an unhealthy reliance on unit testing). One of these days I am going to do a post on the supreme usefulness of regression testing, and how 10% of the work of writing a suite of unit tests can get you 90% of the winnage.

  8. Feel free to provide rebuttal.

  9. Rebuttal is in the last three posts, plus the next-but-one that I’ve not published yet.

  10. “(though not necessarily in O(log n) time).”

    I’ll submit that if it isn’t running in O(long n) time, it’s not a binary search, even if it has produced the correct answer in every test so far.

  11. This post strikes me as a pyramid of strawmen standing on each others’ shoulders. When you say something like:

    “it’s likely true that you could have bashed your way through to a working implementation after a few iterations of coding and testing. But –

    * Your code would be less clear
    * You wouldn’t understand clearly why it works
    * You would find it harder to extend or modify in future”

    …as if that’s an axiom, as if (to borrow your example) Peter Norvig, upon attempting to write code test-first would somehow render him unable to also apply analytical thought processes is both absurd on its face, and contrary to my own personal experience (as a developer, not my experience with Norvig, which I have none of).

    If all you’re saying is that untalented developers following TDD because it’s the flavor of the month will write crappy code, well…yeah. Untalented developers aren’t going to be able to write better code if we try to force them to apply their feeble brains to solve the problem unassisted. Just like (in the 20 years since I started writing code for a living) CASE tools, UML, Java, refactoring browsers, design patterns, the Rational Unified Process and threats of bodily harm haven’t helped weak developers deliver better software. All we can do for them is hope that they don’t get a job writing code for my bank.

    Is there someone out there seriously claiming that any degree of reliance on unit testing removes the need to think about the problem? I’ve read a lot of the literature, and I can’t remember seeing anything that can reasonably be extended to say that.

  12. Jeff Dege: I know what you mean. When I considered whether we should count O(n) binary searches as passing the test, I found myself thinking that if we do, then we ought to accept this implementation too:

    int binsearch(int a[], int val) {
        for (int i = 0; i < len; i++) {
            if (a[i] == val)
                return i;
        return -1;
  13. Testing is not a substitute for thinking. Even if it was (from the perspective of effectiveness) it wouldn’t be as fun.

    For me the appeal of programming lies in two things
    1) the joy of truly understanding the problem and coming up with a clean way to do it and 2) actually producing a product. The boundary between these two isn’t always clear-cut especially since the 2nd is often needed to verify the first.

    I think the flaw in this exercise, from my perspective, is that I’m familiar with binary search and have implemented it successfully in the past.

    Effectively the good consequence of putting a little extra thought into the process has been removed by virtue of the fact that I’ve “solved” this problem already. The tedious details remain however.

    I’d like to think I’d be more thoughtful if/when I next came across a new problem but maybe that’s just me talking out the side of my neck.

    Of course in real life a large number of tasks require me to do something just like something else I did before.

  14. I don’t know if this is what you were getting at, but hammering away at the solution by coding whatever goes through your mind is a terrible way to program. If you think about the solution and work it out on paper BEFORE you ever touch the computer, you’ll be FAR better off.

    I always tell people that when I’m entering code, I’m done programming and am at the stage where I’m simply typing in the program. The great programmers agree 100%. Unfortunately, most people think I’m nuts or that it’s impossible.

  15. John Bickers

    I wonder if part of the popularity of test driven development is its opportunity for busy work.

    When these subjects come up, I tend to think of them in the context of programmers coming in widely different levels of ability and modes of thought.

    It’s probably easier and more enticing to some people to write a testing framework and a bunch of small tests than it is to write thoughtful code in the first place. It makes an unproductive programmer appear to be justifiably busy. I think something similar leads to the widespread over-engineering that is seen in code today. One can’t solve a hard problem in an easy way, so one solves an easy problem in a hard way to get a sense of accomplishment.

    A drawback that I’m not sure is covered by your three points is that a heavy testing focus during development affects the code by pushing the programmer to take the testing requirements into account. If a testing framework doesn’t like
    data that comes in lists of strings, for example, one is going to be inclined to avoid using such lists. So in addition to the normal things that dilute the mental effort that can be put into a particular bit of code to make it fit for purpose, like language syntax, performance, and so on, we have testability by some framework.

  16. 1. I wrote a binary-search implementation on your original post. As far as I have been able to tell from testing so far, it has no bugs; it didn’t even have syntax errors. I did this by thinking it through carefully. As a result, it took me a bit of a long time (9 minutes) to write and test it, but not excessively so.

    2. I don’t think test-driven development is a good way to catch bugs when you’re writing software.

    3. I don’t think what you’re describing is good test-driven development. Good test-driven development is very valuable; the process you’re describing is a bad way to do things.

    The benefits of TDD (or behavior-driven development, as it’s called these days) are other things. In Platonic BDD, each test passes the second time you run it — you ran it once to verify that it fails, and once to verify that you implemented what you thought you had. The idea is not to bash out some half-thought-through code and hope it works!

    The benefits of doing it well, though, are manifold:

    1. When you’re trying to figure out a bug, the test suite is a set of *knowns*, which are the currency of debugging; every test that currently passes tells you something about the current behavior of the code. This allows you to reject most hypotheses about the bug right up front.

    2. When you’re debugging or reviewing code, the test suite tells you about the cases the author was thinking about. Sometimes you can spot the bug directly from the missing test case.

    3. When you’re trying to figure out how to use an API, the test suite gives you a working example of how to call it — which is what I’m looking for, above all, in the API documentation much of the time.

    4. Writing the test suite helps you design a sensible API to use to invoke your code, and clarify your idea of what it should do, before you write it.

    5. It helps you conserve your precious gumption by giving you quick shots of positive reinforcement immediately after you improve the code.

    6. When you’re refactoring, it reduces the risk that you’re introducing undetected behavior changes.

    In this case, I made my code correct by thinking it through carefully. But, even though it was perfectly correct and efficient, it was pretty suboptimal, specifically because I dove into writing it without writing tests first. In any case where it failed to find the requested value, it would return nil — perfectly acceptable behavior for a lookup function, but much of the time, you want to follow up an unsuccessful binary search with an insertion!

    If I’d written the tests first (all at once, though, not one at a time, by the book), I might have noticed that this was suboptimal. (Similar remarks apply to the case of duplicate values: my function returns the index of an arbitrary one of the duplicates, rather than the index of the leftmost or of the leftmost and rightmost.)

  17. The single most beneficial technique I know of when writing an algorithm is to write down the code invariants (as comments, or whatever). Don’t just keep them in your head, they will be valuable when you come back to your code later on.

    For binary search, write down that e.g. lower is always smaller than upper, lower < mid <= upper, and so on.

    Thinking about invariants of code is the best technique to really understand an algorithm.

    "2) If the process of writing tests – figuring out how your code should be used and, in essence, trying to break it – *doesn’t* increase your understanding of the problem, then you’re not paying attention."

    It *does* increase your understanding of a problem, but nor your understanding of a solution. You can write tests all day with arrays and elements to find in them, but this will not explain to you whether your test should be "lower < mid" or "lower <= mid". And that is what Mike (I guess) wants to say: tests live in problem scope, not in solution scope. They can describe the expected results, but not the way how to get there. And in particular, it is notoriously hard to test whether an algorithm is O(n) vs O(log n).

  18. Kragen, you said it better than I could. Using tests doesn’t mean you have to hack in the quickest possible solution once the tests fail. If anything the increase in confidence means that once I have a sufficiently good testsuite I can indulge in major refactorings to fix bugs *properly* if I need to, confident in the knowledge that any bugs this introduces will be caught by another testsuite run. It doesn’t mean I cut corners and try to just do the minimum necessary to get the test to pass: that’s a recipe to an unmaintainable pile of jackstraws masquerading as a codebase.

    I think coverage tests are fundamentally different from the kind of regression test you write once users start hitting your code with bugs. Coverage tests are ‘sets of knowns’, invariants that the code must pass. Regression tests are specifically to cover holes in the coverage tests; inasmuch as they are anything they are tests, written later, for the other tests!

  19. Martin Probst, thanks for this: “You can write tests all day with arrays and elements to find in them, but this will not explain to you whether your test should be “lower < mid" or "lower <= mid". And that is what Mike (I guess) wants to say: tests live in problem scope, not in solution scope". You are exactly right: that is what I’ve been trying to say, but it’s taken me three long articles (and counting) to get part way towards saying what you successfully summarised in nine words.

    (You won’t be surprised to hear that invariants are the subject of my next programming article — I am astounded that in 764 comments so far on these three binary-search articles, you are only the second person to mention them.)

  20. Good points Mike and I agree with most of them. One of the points I don’t agree is the way you describe Test Driven Developement (TDD) and the example. Kragen already wrote a great comment that mostly matches my opinion. Here are my two cents: 

    TDD is more about good code design and architecture than the actual testing. If you design your code well and separate the concerns so it’s unit testable you’ll end up with an excellent, extensible architecture. 
    The way Mike described TDD leads to wrong impressions. Come on, who would write a test in that way to find out if an algorithm works?! Of course this is a wrong approach. You would write some test cases first, implement the algorithm after you thought about it (not try & error) and see if it works. If not, think about it again. 
    Real TDD (not what Mike describes) isn’t easy, you have to think about the architecture first and writing a test for an algorithm isn’t TDD. As I already wrote, it’s more about good extensible API design and not try & error.  

    However, I agree that all these techiques and buzzwords you name are over hyped. Most of the techniques are useful, but all these buzz around it and the strong statements from different view points don’t help. Software projects could be totally different and so are the techniques that fit best.

    A rant about the paradigm hype is OK, but not by using a simple algorithm as an example for TDD. Doesn’t work.       

  21. I think you’re missing TDD’s point. It’s not about testing but about design. Not about the inners of algorithms but of the contract a class (or method) has with its context. About what parameters it accepts and which results it generates. It’s about coupling and not O(log n).

  22. Juan Manuel writes: “[TDD is] not about the inners of algorithms but of the contract a class (or method) has with its context. About what parameters it accepts and which results it generates. It’s about coupling and not O(log n).”

    I agree with all that.

    That’s why it’s not an appropriate technique to rely when trying to implement an algorithm that actually does something, whether that be binary search or sudoku solving. It’s about interfaces, not implementation. But at some point, you have to actually implement.

  23. Apologies in advance for the long comment, but I can’t help think that people are talking past each other on this one. It is obvious – at least to me – that TDD is a very useful weapon in a developer’s armoury. I also know that Pair Programming can be an excellent way of discovering issues early, improving implementation and of quickly sharing application knowledge among developers. Likewise, I believe that automated continuous integration and regression testing is probably the single most important technique to ensure the success of any reasonably complex piece of software (and often the hardest to achieve and so the least often implemented). However, despite all these new-fangled ideas, tools and techniques, an uncomfortable fact refuses to go away: software projects continue to fail, continue to be delivered late and full of bugs, and remain as steadfastly unmaintainable as they ever were. So clearly, summat’s up.

    And what’s up is us: the people. We still need people to program computers, think up requirements, write tests, manage projects. Unfortunately people are fallible, and there are many, many more of us in software development today than there were when I started out when God was a boy. And from personal experience the majority of developers I meet wouldn’t know a binary search if it chopped them in the back of the knees (pun intended). But does it matter? I think the answer is both yes and no.

    The answer is no, because software development today must be geared towards quickly solving today’s problem with the expectation that tomorrow’s problem will be different and most likely unforeseen. So frameworks, libraries and toolkits to get the job done in time are crucial in this pre-fab world. Furthermore the complexity of the interactions (AOP anyone?) of all the various functions of a large application which has been mainly constructed from different black boxes cobbled together with some XML glue means that testing rather than inspection is probably the ONLY way to know whether what you’re building meets the acceptance criteria (or will even work properly).

    But the answer is also yes because just being a glue-merchant is not enough when you actually need to solve domain-specific problems. Given a set of 100 images on a web page that needed to be displayed in a random order on each visit, a developer who only knows about about a rnd() method in a library is likely going write some function that creates an empty list and then repeatedly calls rnd(n) on the original list until all the empty slots are filled. Whereas the developer with prior knowledge of this category of problem will implement the Fisher-Yates shuffling algorithm or something similar. Both solutions will produce the right end result, but the former is ugly, crude and clumsy, while the latter is simple, elegant and – dare I use the word – beautiful. Both developers of course need to test their implementation. But perhaps the first developer who claims that it doesn’t matter whether he knows how to implement a Fischer Yates shuffle is pragmatically right: after all, the clumsy, inelegant solution got the job done, as the tests proved. But the question I’d ask is: which one would you want on your team?

    Incidentally, there’s no shame in not knowing a better way exists. A good programmer might now know about Fisher Yates, but she will know that the solution that presents itself in her mind is ugly and clumsy and so will be driven to search for a more elegant one. A poor programmer won’t even recognise the ugliness of his code, or worse yet, will recognise it, but not care.

    And I think that’s what this blog is ultimately all about and why it has attracted so much interest. To quote Oscar Wilde: “We are all of us lying in the gutter – but some of us are looking at the stars”

  24. Nice article, but man all those sushi pictures made me hungry!

  25. Pingback: The Time of Angels (11th Doctor, episode 4) « The Reinvigorated Programmer

  26. Pingback: Top Posts — WordPress.com

  27. Pingback: From “The Reinvigorated Programmer”: Are you one of the 10% of programmers who can write a binary search? « Brendan Graetz

  28. Hi Mike, I have been following this and the previous post. I have recently blogged: http://paddy3118.blogspot.com/2010/04/more-on-binary-search.html a reply.

    In summary I have problems with your rigid division between testing time and development time, and make a point about informed testing which fails one of Darius’ passes. (Darius – I was standing on the shoulders of your great work, thanks).

    – Paddy.

  29. Dennis Decker Jensen

    You mentioned “thinking clear about a problem.” An associate professor of my university uses the word “mindfulness”, i.e. being mindful about what you do, how you do it (and when you do it).

    I agree with the points of your article. Test-driven development (TDD) is useful of course, although mostly for segregating into modules, designing APIs, and alert one to bad dependency management. Similarly there are many other techniques or tools that are also useful. I think a lot of the comments missed your critique not of testing, but of how testing is used, although you stressed several times in your article that you aren’t bashing on testing per se.

    In the example of binary-search I found proof-directed development (on paper) to be the most efficient technique, and it forced me to be mindful about what the code did. Notation is another very powerful tool (hint: Lisp-like macro-systems, and Iverson’s APL).

  30. Alright, you’ve managed to convince me that this sort of test is valuable.

    I was quite frustrated when I did the test because I was busy with other things, but I made the mistake of idly browsing and ran into this post. I can’t just leave a challenge like that unanswered!

    I think years of coding under pressure has made me really value speed-coding techniques, and writing a couple tests and hammering it until it works would definitely have been faster than reasoning about the edge cases. Of course it almost certainly would have edge-case bugs. ‘Technical Debt’, they call it. It worries me that a lot of my code in industry might suffer from such debt. Is it my fault for leaving out such reasoning and testing in order to meet an unreasonable schedule? I wonder.

    I spent a bit of time improving my tests and it handles all cases I’ve tried correctly (including all the common errors you listed in part 2), so I’m still a success (modulo signed overflow, explicitly allowed in your original problem description.) I’m also a lot happier with being successful now that you’ve convinced me that the test is valuable. Hooray!

  31. Pingback: Binary Search, and Testing - Laughing Meme

  32. Pingback: Writing correct code, part 1: invariants (binary search part 4a) « The Reinvigorated Programmer

  33. Winslow Thermain

    The pictures are nauseating and distract from the text.

  34. I’d say your hypothetical example shouldn’t sound familiar to anyone who does TDD correctly. Test failures should cause you to examine your code to find root problems, not just keep adding special cases until the tests pass. In fact, a good set of tests allow you to make large scale changes in confidence.

    As a final thought, look how you found how only 10% of the examples passed. It was via automated testing, and not reasoned analysis of the code!

  35. Pingback: Las pruebas y los programas « Mbpfernand0's Blog

  36. Nice article series. I agree TDD can be misused in this way, however, it is worth pointing out that much of the TDD community has started to view it as Behavior Driven Development.

    In this case, if the algorithm results are part of the requirements, you would actually want to write a test to verify it. The behaviors being tested are then:

    – finds element in array
    – finds it in O(log n) time

    Whether it is useful to test a simple binary search for O(log n) is debatable, but BDD would have made you think about the fact that the performance characteristic was part of the requirements.

  37. Pingback: @Lab « Tech Tock

  38. At some level we’re all ‘monkeys’ – my degree course covered building logic gates from transistors, adders from logic gates, and building a simple CPU. It also covered assembler, and a lot of writing basic algorithms – and absolutely nothing about building applications.

    I think it was a fantastic foundation – but I’m still left with a ‘monkey’ understanding of a lot of modern application work – i.e. my knowledge of HTTP is largely based on what I’ve picked up troubleshooting web apps (‘hmm, it looks like it’s only doing 2 connections at a time’ – Google it – ah, that’s what the standard allows), ditto things like knowing that by default, gzip compressing files in our version of Apache breaks cacheing, that disabling keep-alive to handle a bug in IE5 back whenever it was, means worse performance for modern browsers, and that HTTPS can be a right pain too, if you’re not in control of everything.

    My team complain about the same thing – most of them have been working in IT since the early 90s, but feel increasingly like ‘Google coders’ when they deal with browser and database issues.

    I could go on – a lot – what strikes me is how much ‘acquired knowledge’ rather than skill is actually required.

    I think what’s attractive about programming is similar to what’s attractive about learning a musical instrument or drawing or painting – there isn’t a lot to learn about your tool, most of the effort is in developing and improving technique, towards mastery.

    It’s not that you can’t achieve mastery of something like Apache, browser differences, server virtualisation, etc, it’s just that it gets away from what’s actually interesting, and also that the scope is much larger.

    But equally, you can’t just divide it all up amongst different people, because it’s typically the integration of two layers that’s the problem.

  39. JulesLt points out:

    What strikes me is how much ‘acquired knowledge’ rather than skill is actually required.

    I think that is dead on, and very insightful. It’s why the old recruiting idea, that you don’t care what a candidate knows but how well he learns, is not applicable any more. That was OK in the days when it just meant that a good FORTRAN programmer had to learn COBOL instead; but the sheer amount of stuff we need to know these days is, I estimate, literally two orders of magnitude greater than what the great programmers of the past needed to learn about their environments to be productive.

    I’ve touched on this in a previous article, by the way — Learning a language vs. learning a culture.

  40. Pingback: Another challenge: can you write a correct selection sort? « The Reinvigorated Programmer

  41. Pingback: Tagore Smith on functional programming | The Reinvigorated Programmer

  42. Pingback: Testing « Just Rakudo It

  43. Pingback: Programming Books, part 5: Programming Pearls | The Reinvigorated Programmer

  44. Pingback: Blog da QUALIDATA » Blog Archive » Mantenha a mente aberta. TDD nem sempre é o melhor!

  45. Pingback: Mantenha a mente aberta. TDD nem sempre é o melhor! | F2 - Sistemas

  46. Pingback: More thoughts… | Scali's OpenBlog™

  47. We found programmers, which have a hatred for mathematics, they love their tools. (I am a rubyist and i roaard!), but they are blind to the fact, that a language is just the hammer they use. The real work sits in the actual thinking.

    In interviews, we did, most candidates start to write code directly without pulling out pen and paper to work out the problem. They just blindly start to write. Programming is 90% thinking and 10% coding, so this is probably a very bad approach.

    I am not to fond of TDD either. In most dynamicly typed languages, 90% of TDD comes down on implementing a static type system (to catch what happens if someone puts in a string into a function accepting integers or what happens when you put a negative number into something, which expects naturals). Even the strong ones suffer from this. The other 10% is for testing the algorithm.

    People should write the behavioral 10% before hand, but they often write the whole shebang, including their type checks, which wreack havoc on their implementation. It is much better to move this to contracts or some other mechanism. Static type systems suffer less from this problem.

    I am also quite shocked by most TDD frameworks. Many don’t have automatic parameters generators. How can you expect that a programmer find all possible failing cases by hand? This should be an automatic process, we are only human you know.

    I am fond of a somewhat other approach of testing your code, that is law driven development. You derive some laws your code should adhere too and write towards these.This will keep your code clean, short and easily testable. In combination with a strongly typed static system, you keep your tests short and to the point.

    E.g. say you are writing a parser g and a pretty printer f, then:

    (g.f) = id 

    should hold. That means, if I pretty print an AST, then by parsing it, I should get the same AST back, which seems pretty logical. The otherway around doesn’t have too hold:

    f . g    /= id 

    That is if I parse a text, there is no guarantee I can pretty print it in the same way (think of white space or a syntactical error in the string).

    For the binary search, you can also setup such garantuees. E.g:

    bsearch : Needle -> Haystack -> Maybe Needle

    where maybe can be Just element or Nothing, if nothing is found.

    If the needle is in there:

    bsearch needle haystack =  Just needle

    And if it isn’ the following result is expected:

    bsearch needle haystack = Nothing 

    For all these examples you can generate automatically your testcases, which makes it much easier to find edge cases. See quickcheck (haskell) or hypothesis (python). It is a much more natural way of testing your code and it pushes you to really understand, what your code does.

    There are even ways to test the timing constraint (search should be done in O(Log n) guesses). By using languages with dependent types you can even encode it in the type system and prove your result is correct. Then testing is obsolete.

  48. To echo Duwanis’s point 2: Thinking about how to test something has helped me design it. Maybe the test-driven thoughts are ones i would have had if i was a better programmer.

  49. Pingback: Are you one of the 10% of programmers who can write a binary search? | The Reinvigorated Programmer

  50. Pingback: Common bugs and why exercises matter (binary search part 2) | The Reinvigorated Programmer

  51. Pingback: Writing correct code, part 1: invariants (binary search part 4a) | The Reinvigorated Programmer

  52. Pingback: Writing correct code, part 2: bound functions (binary search part 4b) | The Reinvigorated Programmer

  53. Pingback: Writing correct code, part 3: preconditions and postconditions (binary search part 4c) | The Reinvigorated Programmer

  54. Nice article, however one of the sushi pictures looks like someone has slaughtered and prepared the very hungry caterpillar.

    It still looks appetising though.

  55. Thanks a bundle, Gareth: now I’ve seen it, I can’t unsee it!

  56. “I’m not a great programmer; I’m just a good programmer with great habits.” – Kent Beck – Creator of XP, which in turns include TDD

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.