Writing correct code, part 3: preconditions and postconditions (binary search part 4c)

Having dealt (briefly and informally) with invariants and bound functions in the two previous installments of this series, it’s time to look at preconditions and postconditions.  These will likely be familiar to most of you, even those who’d not heard of invariants before, because these days they are known by the trendy, XP-friendly name design by contract.  (OK, it’s not quite the same thing; but they’re closely related concepts.)

What are preconditions and postconditions?

When you invoke a function — or, all right, a method — you have a sense of what needs to be true when you invoke it, and what it guarantees to be true when it returns.  For example, when you call a function oneMoreThan(val), you undertake to ensure that val is an integer, and the function undertakes to ensure that the value it returns is one more than the one you passed in.  These two promises — the precondition and postcondition — constitute the contract of the function.  So:

  • The precondition is the promise that you make before running a bit of code;
  • The postcondition is the promise that the code makes after it’s been run.

Often, the chunk of code that we’re interested in the pre- and postconditions of is a function — that certainly seems to be the case when we talk about a contract.  But in fact, you can usefully think of evcry statement in a program as having pre- and postconditions.  For example, in the binary-search program that we are all now heartily sick of:

int binary_search(const int a[], const int size, const int val) {
    int lower = 0;
    int upper = size-1;

    /* invariant: if a[i]==val for any i, then lower <= i <= upper */
    /* bound function: upper-lower+1 */
    while (lower <= upper) {
        int i = lower + (upper-lower)/2;
        if (val == a[i]) {
            return i;
        } else if (val < a[i]) {
            upper = i-1;
        } else { /* val > a[i] */
            lower = i+1;
        }
    }
    return -1;
}

You can say that the precondition of the line upper = i-1 is that val < a[i].  More properly, the precondition includes this fact; other facts are also assumed, such as that a is sorted, but if you try to be too exhaustive with these things, you quickly get bogged down and run into diminishing returns.  (This may be why formal treatments of these ideas are often stodgy and hard to read.)

Now consider the precondition for the main WHILE loop in this function.  For the body of the loop to run correctly, one of the things we need to ensure it can rely on at the top of each iteration is that if val is anywhere in a, it’s between the positions lower and upper inclusive; and one of the things we in turn can rely on at the bottom of the loop is that if val is anywhere in a, it’s between the positions lower and upper inclusive.  If these conditions sound hauntingly familiar, they should: they are of course identical not only to each other, but to the invariant from the first article.  That’s not an accident: a loop invariant can be thought of a loop’s precondition and post condition that happen to conveniently be the same.  The postcondition of each iteration guarantees the precondition of the next.

[Irrelevant aside: the feedback loop strikes!  I generally find my sushi photos just by googling for an appropriate term and paging through the results until I see something I like.  But today, for the first time, one of the top ten Google image hits for "sushi" is an image from a Reinvigorated Programmer article.  This happens to me all the time when I'm searching for sauropod vertebra pictures, but it's a first for sushi.]

What is the precondition for binary search?

We need to be a bit careful about this: if we’re slack in specifying these conditions, then all sorts of things can go wrong.  For example, Lizard was one of many people who, attempting the binary search challenge, found that their code failed the tests not because of bugs in the routine itself, but because his test-harness neglected to sort the array before searching in it.

So part of the precondition for binary search is: the array must already be sorted.  But that’s not good enough: it has to be sorted in the right order — for example, if it’s sorted in descending numeric order, the binary search will shoot right to the wrong end (unless it happens to get lucky on the first probe).  So the array must be sorted in ascending order.  Is that good enough?  Not quite — which ascending order?  If an array of numbers is sorted in dictionary order, for example: (1, 10, 11, 12, 2, 3, 4, 5, 6, 7, 8, 9), then the search will also fail.  So do we need to say that the array must be sorted in ascending numeric order?  No, still not good enough — that would make string sorting impossible.  I think (and even now I am not 100% certain that this is good enough), that we need to say: the array must be sorted in ascending order according to the ordering used by the comparisons in the search function.

I think that’s all we need for the precondition, but if I missed anything, do say.

(I am ignoring completely uninteresting preconditions here, such as that the array pointer mustn’t be null, mustn’t point into a string, etc., or that in versions like the C one where the size is passed in separately, size must be less than or equal to the length of the array.  Those are violations of the contract on a more fundamental, physical level.  It’s as though you were telling me the attributes of your Ideal Woman and I suggest you should include “must not be a tomato plant” or “must exist in three physical dimensions”.)

What is the postcondition for binary search?

Again, we need to be careful.  An obvious postcondition for a binary search invocation res = binsearch(a, size, val) would be: either res == -1 or a[res] == val.  Looks good, but consider that the following routine would satisfy this postcondition:

int binary_search(const int a[], const int size, const int val) {
    return -1;
}

Oops!  So we need to be more specific: either (res == -1 and val is not contained in a) or a[res] == val.

But that doesn’t do it either!  Can you see the loophole?  I’ll give you a clue: it still allows me to write a search routine that is O(1) rather than O(log n) but satisfies the postcondition.

Got it?  Here it is:

int binary_search(int a[], const int size, const int val) {
    a[0] = val;
    return 0;
}

(This case is overlooked by the binary search test suite that was helpfully provided by Steve Witham.  In his file of 4096 test-cases, each case is marked simply as either finding or not finding the sought value, and the index that should be returned is not specified.  (Not criticising you, Steve — this is a pretty pedantic point, after all — just using a handy example of how easy this kind of thing is to miss.))

So to be completely rigorous, we need to be more specific.  How about: a is unchanged, and either ((res == -1 and val is not contained in a) or a[res] == val)?  I think that’s sufficient, but again it’s possible that I missed something.  Let me know if I did.

In the absence of the stipulation that a must not be changed, there are other, even dumber, ways to cheat: for example, I could have my routine set each element of a to val+1, and return -1.  I’m not saying I would.  I’m just saying I could.

What’s the point?

Previously we saw how the use of an invariant can help you be sure that if your function terminates, it does so correctly; and how a bound functions can help you be sure that it does in fact terminate.  But none of that reasoning is valid unless we know in advance that the assumptions they make are justified.  For example, when we reasoned that the code fragment

if (val < a[i]) {
    upper = i-1;
}

preserves the invariant if the sought value is present in the array at all, then it is present in the current range, we were implicitly depending on the assumption that the array is sorted.  Stating that as a precondition makes it explicit, and gives you a more solid basis for reasoning about your code.

Similarly, the postcondition is what makes the return-value of the function useful.  If you don’t know the postcondition, then you can’t tell whether a return value of 3, say, means that the element was found at index 3, or that the element was absent but if  inserted at position 3 would keep the array sorted.  Or indeed whether it means that the value was found at index 42.

So, again: the precondition and postcondition together constitute the contract of the function.  It’s knowing these that allows you to treat the function as a black box.

Appendix: Tools to verify preconditions, postconditions, invariants and bound functions

I hope this series of articles has persuaded you, if you weren’t already on board, that formal methods of reasoning about code can be lightweight, practical, and beneficial.  We’ve seen how to use these methods to derive a correct binary search first time out of the gate, so that when we subsequently run tests, we are verifying code that we already believe to be correct, rather than probing to determine whether it is.

But can’t the computer do some of the heavy lifting for us?

Yes it can, in various ways and to various extents.  For example, in the original binary search code (see above), you’ll notice that I declared the array as const int a[] — the const lets the compiler check that my code doesn’t mess with the values in the array, and so verifies at compile-time the part of the postcondition that says a is unchanged.  That is a small part of the whole process, but an important part.  (To make my cheating O(1) version compile, I relaxed the constraint by removing the const from the function signature — did you notice?)

What else can our languages help us with?  Looking at the four properties we’ve discussed in this series — together at last! — we see:

  • Precondition: a is sorted in ascending order according to the ordering used by the comparisons in the search function
  • Invariant: if a[i]==val for any i, then lower <= i <= upper
  • Bound function: upper-lower+1
  • Postcondition: a is unchanged, and either ((res == -1 and val is not contained in a) or a[res] == val)

We can use assertions to get the computer to check at least some of this for us.  For example (assuming that const has taken care of a is unchanged), we can validate most of the rest of the postcondition by adding (in C):

assert(res == -1 || a[res] == val);

But that ignores the val is not contained in a part of the postcondition, because C has no direct way of expressing a predicate like that.  Similarly, it offers no way to assert the precondition that a is sorted; nor can it assert the invariant, because of the for any i part.  And there is of course no way to assert that the bound function decreases by at least one on each iteration — at least, not without introducing a state variable to hold its previous value.

All of this suggests that C is not really expressive enough for this purpose.  Frustratingly, it seems that Ruby is even worse — the core language and libraries, for example, don’t even seem to have a assert function, which is frankly risible.  What we want is a language that lets us state preconditions, invariants, bound functions and postconditions explicitly, not just as comments.

Such systems do exist — for example, Mihai’s entry in the original challenge was written using VCC, an extension to C that allowed him to explicitly state preconditions, postconditions and the loop invariant (though a quick skim of the project page suggests that it doesn’t support bound functions).   This kind of tool is a step in the right direction, but it’s a shame that these ideas aren’t integrated into the core of popular languages.

(I now confidently expect people to explain to me that Haskell does this right.  Folks, it’s great that Haskell is perfect, but it doesn’t satisfy the precondition “popular”.)

Post Script

I close this series with a warning in the form of a quote from arguably the greatest computer scientist ever to walk the earth: both the master of programming theory and the author of many significant programs that prove the practicality and applicability of his wisdom.  Here it is: mark it well.

“Beware: this code has not been tested, only proved correct.”

– Donald Knuth.

About these ads

29 responses to “Writing correct code, part 3: preconditions and postconditions (binary search part 4c)

  1. I gotta say, I’m slightly disappointed that your sushi photos are all from Google. I had this mental image that you go around with a huge SLR photographing (and presumably eating) all sorts of funky, delicious sushi in your non-coding hours.

  2. Preconditions and postconditions (as well as invariants) are built into .Net 4.0 in the System.Diagnostics.Contracts namespace. So if you’re working in the .Net ecosystem you can give them a try.

  3. I had this mental image that you go around with a huge SLR photographing (and presumably eating) all sorts of funky, delicious sushi in your non-coding hours.

    He does. He just didn’t want to make everyone feel bad about how much sushi he eats. He’s probably crying mercury-laden tears now that his secret is out.

    “must exist in three physical dimensions”

    Only three, huh? You should meet my girlfriend, Tessa. She’s a little hyper. Some people tell me that she is only three-dimensional, but I think they’re just projecting. What squares! I haven’t really wrapped my mind around her yet.

  4. Hi guys, hi Mike

    I am one of the developers of VCC, which you gratuously plugged in your article. This is just to let you know that bounds function and termination analysis is something that we have in the works.

    Thanks for this article series! You did a very good job explaining the correctness concepts!

    Cheers

    Stephan

  5. I think you can describe the precondition on the set of numbers being searched in terms of defining a transitive closure on it: e.g. for all x,y in the set a transitive relation exists R such that x < y. (If you allow duplicate elements, the relation must also be reflexive, i.e. x <= y). The closure for this relation is defined as a sequential walk through the elements of the set, from the first to the last. Note that he empty set needs to be handled as a special case.

    If this observation is correct, then it leads me to believe that there are only three initial states we need to handle when searching for an element ‘e’ in the set.

    1. the empty set
    2. a set with one element in it
    3 a set with two elements in it

    Such a limited number of initial states is easily tractable for proving the correctness of the binary search algorithm. And makes me think that 4096 test cases might be overkill. Somebody will no doubt correct me if I’m wrong!

  6. Hi Mike, I have really enjoyed reading this series of articles!

    I am going to throw this out there – You’ve used an iterative algorithm to implement the binary search. Recursive algorithm implementations are quite a popular way to implement a binary search, indeed it was the first thing that came to mind when I attempted it.

    I would like to read what you have to say on invariants, bound functions and pre&post-conditions for recursive algorithms.

  7. What we want is a language that lets us state preconditions, invariants, bound functions and postconditions explicitly, not just as comments.

    Take a look at Alloy. Preconditions are set up as as facts. Though Alloy isn’t a general purpose programming language per se, its a language for modelling algorithms.

  8. @vince:

    You should check out Clojure: the latest versions have optional pre and post-conditions you can add as metadata to any function, and will be checked upon execution (you can also turn condition checking off for performance once you’re sure you’re correct).

    Because it’s functional and immutable, it makes invariants and bound functions pretty explicit as well.

    I don’t mean to come off as a Clojure fanboi – It really does seem relevant to the discussion. And yes, Mike, it does seem more popular among non-academic, human developers than Haskell.

  9. I’ve long wondered why it is that, after 60 odd years of programming as a profession, languages are evolving away from all the good, rigorous concepts we learnt about at university (like the ones Mike reminds us of in this series).

    Instead of languages that prize robustness and correctness by requiring us to tell the compiler what we mean (so that it can laugh at us and tell us all the ways we’ve failed before a user does), we get dynamically-typed languages full of wheezes like reflection. It’s left to the professional to decide whether he can be bothered to wheel out a menagerie of testing frameworks and analysis tools and methodologies and QA monkey Shakespeares before shipping it. Is the code going to be ok when that int overflows? What about that error handling branch that’s never actually been run? I guess the bug report will say…

    It’s not all bad news of course – garbage collection, for example, was a massive leap forwards, and a great example of languages engineering out what was probably then the most common programming fault.

    Also it’s not like static analysis is the alpha and the omega (we know that from the halting problem).

    I suppose it’s no mystery that an industry/market that invariably prefers to defer costs from sales to support (i.e. ship it and sort it out later, get your money up front) would prefer languages that offer rapid development. (I don’t know what happens in industries with other goals – does ADA still rule the roost in defence?)

    Mike’s earlier series about programming languages shows that even we, as coders, prefer languages that are pithy and expressive and would probably not have the patience for an overly pedantic one.

    FWIW, to my mind the biggest “best practice” concept that we fail on is code reuse. Of course, there are 1001 excellent reasons why this happens (language barriers, API fluidity, the framework issues that Mike has touched on previously), and the solution may well lie outside of language design (e.g. a moderated standard library), but therein lies the challenge.

    Wait a second… whose blog is this again? I’ll shut up now.

  10. Sacha, I think it’s because as useful as compile-time safety is, it also implies a lot of “bureaucracy.” It’s directly analogous to bureaucracy in a large corporation or government – it helps guard against corruption and mistakes at the price of agility and worker efficiency.

    Ada failed largely for this reason, if I understand its history directly. Java doesn’t go as far, and that’s still by far the #1 complaint about it.

    Haskell manages to be both expressive and type-safe, but only at the expense of an explosive complexity and a highly academic conceptual model. I’d say that less than 10% of the programmers I’ve worked with would even be capable of grokking it.

    So sometimes, people just choose simplicity and expressiveness over compile-time safety. I’m not saying the two *have* to be at odds, but the fact is, observing the menu of current and historical languages, they do *seem* to be at odds.

  11. I’m by no means an Eiffel guy, but I seem to remember from looking at it a while back that Eiffel was all about design-by-contract, which would at least cover the preconditions and postconditions reasonably well. I’m not sure that it has any special handling of invariants and bound conditions, but certainly these features could be implemented as well there as they are in C.

  12. Hi, Sacha, good to hear from you. There is irony, of course, in that I’ve spent most my recent time on this blog advocating the (admittedly informal) use of formal methods, but much of the rest of its lifetime arguing that verbose, statically typed languages are a waste of our valuable programming time.

    Can we have our cake and eat it? Are there no languages that are dynamic, quick to work, and don’t make you constantly repeat yourself — but which also support the use of correctness mechanisms like constness, side-effect-free functions, automatically checked invariants and bound functions and pre- and postconditions?

    Maybe, maybe. The two languages which seem to float around, orbiting this blog and popping up as the candidate answers to all the questions, are Clojure and Haskell. (OK, Haskell is statically typed, but inference apparently takes care of most of the verbosity.) I have a feeling that once I’ve mastered Scheme, one or other of these two will be my next language. But, but, but. Haskell is so alien. It wigs me out. Don’t tell me you don’t feel the same; I know you do.

    P.S. Alec’s taste in music is now officially worse than yours. Congratulations.

  13. @luke: Well I’m learning Haskell at the moment, spurred on in no small part by this blog. I don’t think I have brainspace for Clojure as well, but I will take a wiki-peek. Maybe Mike’s delved further into Haskell than I have, but at the moment it doesn’t seem too alien to me. However I am aware that I’m only just scratching the surface, starting with the easy stuff like list comprehensions:

    [if x < 10 then "BOOM!" else "BANG!" | x <- [7..13], odd x]
    

    Haven’t got to monoids, monads or any of that stuff yet.

  14. Pingback: Top Posts — WordPress.com

  15. By the way, some of us seeing those pictures embedded within your blogs are vegetarians who do not like to kill animals or see killed animals.

  16. Alok, part 2 of the Writing Correct Code series (bound functions) used cheese photos, just for you and your fellow vegetarians. Go back and read that one if your taking-offence levels are going off the scale.

  17. Phillip Howell

    Frustratingly, it seems that Ruby is even worse — the core language and libraries, for example, don’t even seem to have a assert function, which is frankly risible.

    I think you misspelt “raisable” there. ;-)

    Seriously, though — traditional assert() is evil stuff, driving people toward a solution where they aren’t testing what they fly / flying what they test.

    Permanent exceptions on contract failure are (maybe, probably) better, and typically allow for easier recovery of field-failure information.

  18. Phillip Howell, I don’t understand the distinction you’re making between “traditional assert” and “Permanent exception on contract failure”, except perhaps that the latter is a special case of the former. How can the former be bad and the latter good?

  19. Phillip Howell

    Mike,

    Traditional assertion statements go away in a production environment (see assert.h and the NDEBUG flag, for example).

  20. Phillip, I don’t see how that makes asserts “evil stuff”. In particular, I don’t see how the desirable property of configurability makes them worse than an otherwise equivalent facility that lacks that flexibility; and I see less still how this bears of Ruby’s appalling lack of any kind of assertion support outside of unit-testing modules.

  21. Phillip Howell

    Mike,

    I’m claiming that such configurability is a bug, not a feature. It violates the TWYF/FWYT principle. It is at the very least a source of bugs — the current codebase I work on is scattered with statements like this:


    assert(do_something() == SUCCESS);

    And so were assertions disabled (e.g., via NDEBUG), the software would certainly fail.

    Moreover, in a multiple-context environment, even a comparison or set of comparisons that are or aren’t run could change timing enough to hide or expose a latent bug.

    Even more moreover, if the contract is broken in the production build (because of some untested leg or unexpected timing condition, e.g.), you get undefined behavior instead of a well-defined failure that presents a sane error message to the user.

    I’m not saying to not explicate contracts within the code, I’m just saying that traditional assert() is a particularly crappy way to do it.

    w/r/t Ruby: I’m claiming that lack of assert() in favor of using exceptions for that purpose is a feature, not a bug.

  22. Phillip, I’m afraid I can’t agree. There are assertions one might make in development that would be completely inappropriate in production code for performance reasons — for example, asserting the precondition of a binary search, that the array to be searched is sorted. Leaving this turned on in production would make the search routine O(n+log n), with run time slightly worse that twice that of a simple linear search. I take your point that some assertions are best left in production code (and you won’t be surprised to hear that, yet again, Kernighan and Plauger were arguing for that back in 1974); but this doesn’t make a good religion, and there are times where this isn’t appropriate.

  23. Phillip Howell

    Mike,

    If it’s not appropriate to do for production than it’s not appropriate to do for development and test — because then you aren’t testing what you release.

    There are some statements of precondition that are best left to comments — anything that affects the performance of your code that significantly is one of them.

  24. I find it hard to persuade myself that you really believe that.

    Think it through. We agree that checking that an array is sorted as a precondition of doing a binary search is inappropriate in production code. Are you really going to insist that it is, for that reason, not acceptable during development?

    Do you use Apache::Reload in production mod_perl systems? If not, then do you also refrain from using it in development?

  25. Phillip Howell

    I do mostly embedded work, so my analogy would be a simulator. I would use things like this for prototyping and maybe for early development; but in later stages of development I’d move to real hardware as soon as possible.

  26. Pingback: Another challenge: can you write a correct selection sort? « The Reinvigorated Programmer

  27. Pingback: What does it take to test a sorting routine? « The Reinvigorated Programmer

  28. @Mike: in Ruby, why can’t you simply use:

    Process.exit if (assertion)

    for a full-blown assert-duplicate (.exit! to guarantee nothing gets between it and the process exiting), or:

    throw "failed (assertion)" if (assertion)

    for something capturable and descriptive? Throw in particular will spit out the text and line of code if it’s uncaught.

  29. Pingback: Programming Books, part 5: Programming Pearls | The Reinvigorated Programmer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s