What is “simplicity” in programming?, redux

This is an immediate followup, about twelve hours after I posted the original What is “simplicity” in programming?, because the excellent comments on that post have pointed me to another insight.  In particular, Chris pointed out that “Ideally, you would not have to read the code of all the methods as the name should tell you by himself what is it doing.”

I think Chris has put his finger on an area where individual temperament, preference and aptitude is very important.  Probably, skipping over the small methods is the right thing for at least some programs — the ones that have been extensively Fowlered.  But it doesn’t come naturally to me at all.  It makes me nervous.  I actually feel physically uncomfortable about working with code that I’ve not read.  So maybe that’s why I am happier with one larger method than several smaller ones sprinkled across various classes and files.  (To be clear, I am not advocating big functions and classes — no-one wants to read 300 lines in a single method, or classes with 50 methods.  I’m in favour of biggER functions and classes than Martin Fowler is, that’s all.)

As I said back in Learning a language vs. learning a culture, I find that the best way for me to learn a new technology (be it a programming language, an MVC framework or whatever) is to get an actual honest-to-goodness old-fashioned book, and read it right through.  I don’t feel good about that, because I have a mental model where True Hackers just plough straight in and learn as they go along, but when I’ve tried that approach I never feel fully master of the situation.  (That’s how I feel about CSS, for example: I can do it, but I am not confident that I understand what I am doing, or why it works: I always have to experiment a bit before I get the right combination of margin, border and padding, because I’ve yet to sit down and properly learn the box model.)

And regular readers will remember that I remember fondly the days when we could know everything there was to know about our Commodore 64s.   I think that may be another manifestation of the same character trait: I want to know things deeply; in fact, I need to know things deeply before I can be properly productive.  (That’s why I’ve been learning Rails for several weeks already, but have yet to write any Rails code of my own beyond working through all the Depot examples in the book.)

Now that I’ve spotted this pattern in my behaviour, I can see it at work in other ways: for example, it’s very, very unusual that, having started to read a novel, I don’t finish it.  (The last such was Mervyn Peake’s Titus Groan, which a lot of people love, but which, so far as I can tell, doesn’t seem to have any actual story.)  And when I become interested in the music of someone I’m not already familiar with, I like to buy a single well-regarded album by that artist and listen to it heavily, rather than getting a Greatest Hits compilation, or listening to random tracks on Grooveshark.

So since I like to learn computer architectures and MVC frameworks depth-first, and since I read novels depth-first and listen to music depth-first, I suppose it’s not surprising that I like to read code depth-first — and that, therefore, extensively Fowlered code, with its long chains of responsibility, its delegates-of-delegates-of-delegates and its light sprinkling of tiny classes, doesn’t suit me.

(Again, understand: I am not saying that Fowler is wrong; just that what works for him doesn’t work for me.)

It’s easy to see that breadth-first code-reading is related to top-down programming, and depth-first to bottom-up programming.  Depth-first/bottom-up is all about understanding the details, and letting the high-level picture emerge from them; Breadth-first/top-down is about grasping the overall shape of the system, and letting the details look after themselves (or at least, coming back to them later).  Despite the efforts of Dijkstra and others to impose top-down as The One True Way to write programs, it’s been accepted for many years now that both top-down and bottom-up (and other variants) can work well, and also that different individuals excel in different directions.  So maybe it’s no surprise that the same seems to be true of depth-first vs. breadth-first.

I truly don’t know whether it’s better, overall, to have a depth-first or a breadth-first mind.  Now that I’ve at least realised that there is this distinction, and now that I’ve seen which side of the dichotomy I fall on, I can and will take deliberate steps to improve my use of breadth-first approaches when that’s the appropriate strategy — for example, when reading the code of a high-level class in a Fowlered program.  Hopefully I can do that without converting completely to the dark^H^H^H^Hother side — I want to retain the ability to go depth-first when Deep Knowing is what’s required.

Of course, this also means I am going to need to develop judgement to know when which approach is optimal.  One candidate rule might be: breadth-first code-reading is better when there are strong naming conventions that tell you with reasonably confidence what the lower levels do.  (This brings us right back to Chris’s comment that kicked off this post.)  If that rule is correct, then it implies I’m going to have to do something I’ve been carefully avoiding for many years: read the Gang Of Four Design Patterns book, if only so that I can instantly recognise and interpret class- and method-names based on their patterns’ names. *sigh*

I’d be interested to hear proposals of other rules that we could usefully use to determine when to take a depth-first or breadth-first approach to learning something, be it a codebase, a technology or something completely different.  Let’s hear them in the comments — I want to learn!

Click through this one — the big version is sensational.

Finally, in defence of Fowler: in another comment on the last article, Martin Probst made the important point that “the problem with books like this and their examples is that the technique is geared at e.g. a hideously complex loan system, but to keep the examples understandable they must be so short that it kind of defeats the point.”  That is very true: all writing about programming suffers from the limitations of small examples, but it’s particularly damaging in discussion of techniques like Fowler’s which are largely directed and keeping software comprehensible as a system grows.

Update (a few hours later)

It only occurred to me after having posted this that there is another excellent example that illustrates my depth-first approach: the way I play Quake.  (Although Quake is old, I still like it more than any of its better-looking successors because it feels so solid and chunky, and because there is such a lively community producing excellent free maps for it.)


When I play Quake, I go slowly and carefully, peeking around corners, trying to pick off monsters one by one as far as possible rather than ploughing in with all guns blazing.  I like to play on Nightmare, and to carefully explore every tiny dead-end of the map.  I don’t consider a level properly beaten until I have 100% kills (although 100% secrets is not always realistic).  When my sons watch me playing Quake, they find it frustrating that I go so slowly: they’re always shouting at me to “push the button” whenever I find one.  But I don’t want to push the button until I’ve finished surveying the available ground, so that when I do push it, I can tell what’s changed.  Only then will I push the button, find the new door, and go on to the next part of the level.

Yes, I am playing depth-first Quake.  I want to understand the level deeply.  Not coincidentally, I tend to finish levels much more slowly than most players.  But then I die less often.

50 responses to “What is “simplicity” in programming?, redux

  1. Pingback: What is “simplicity” in programming? « The Reinvigorated Programmer

  2. Two points:

    – I wonder if it’d be useful to (more) directly talk about code-complexity/time. The one-large-method-i-can-understand-in-one-sitting seems to me good for the self-explanatory case, but when you get lots of people (devs) looking at it and lots of modifications/time i would say it increases code-complexity/time. Maybe this is also an attention-span thing. Which leads to my second point:
    – side effects. The larger block of code you can read in one sitting seems likely to have fewer side effects that you don’t know about (presuming you can read code) vs many methods for which you have to keep their side effects and references to their exec environments (stacks, etc) in mind. Ideally you’d have small chunks of self-contained code with few if any side effects, interacting with each other through well-understood interfaces, but that’s the ideal (textbook, anyone?) case. So maybe this is a war between the pragmatists and the idealists.

    A curious observation may also be that the functional languages fit more naturally into the latter (esp. the side-effects free) model. One wonders if the simplicity brought about by that is the main appeal.

  3. IMO, the need to know everything in depth will depend a lot on trust. And I was thinking about it yesterday.

    I’m writing a lot of data analysis code in MATLAB, and it is very liberating — I can take functions at their face value, and only delve into their implementations if I’m curious.

    Contrast that to yesterday when I was chasing down an application bug in C#, which had been caused by a shallow copy of an object with an ArrayList member… Needless to say, the place this showed up was much further away. (This was my first time seeing the code and I don’t have a C# compiler, so I was a bit more peevish than usual)

    It seemed to me that the ability to understand the code has a lot to do with the predictability of the language. I was wondering if the bug in the C# program had been partially because of the hand-holding it gives you in other areas (garbage collection, etc). When I’ve worked in C in the past, I had to pay attention to everything that was going on (ie, know the program in depth). But I’m not a programmer, and I’ve always wondered about scalability for these sorts of things.

  4. My point of view is quite similar of your, I’m a top-down guy.
    I’m not able to go ahead without having a theorical and structured view on the fundamentals about a scientific topic, programming language, code or framework.
    Working with “stuff” witch “I suppose” or “it seems to respect this rule” is a terrible pain, I fill sick.

    I think that guys who dedicate ->completelytop methodology:
    I have seen guys who simply try to produce a quick solution trying to infer “knowledge” from specific examples.(Yes they are lacking the basics about the language or topic)
    Usually the next step is pasting&cutting** code grabbed from blog’s pages without having any clue about, and trying to adapt it in random like approach. (I have seen them, I feel sick again.. :-( ).

    I think that a wise(and best) way to learn and work is to build a structured knowledge about the basics, with an introduction book or similar and then let’s use down->top when working with particular details or features.
    You are still gambling but you know the boundaries between valid stuff and boo-boo.

    P.s: **We call them cut&paste programmers, Have you met them too? :-D

  5. Hey Mike,

    I found the article this post is following up to via HN, and first I want to say I think you are on to something, and I think our styles are the same.

    Second, I have to say I was flabbergasted when, in the middle of reading this interesting technical post, Grooveshark was mentioned. I’m a developer there. It has always been our goal to be /the/ place peoople go to when they want to quickly search for some music and listen to it, but it still surprises me that it’s actually happening! If’n you don’t mind, what is your GS username? (Feel free to email me)

    Back on topic, as I said I am also a bottom-up, have-to-read-and-understand-all-code kind of person. Understanding that is a big advantage. I can embrace it and play to my strengths, while I work on my weaknesses. It’s funny, there is one piece of code that I have refactored several times, essentially having a refactoring war with myself, going from large complex methods to lots of smaller methods, and then back again to large complex methods because my own code was pissing me off. Now I know better than to do that unless it’s actually necessary or really does reduce complexity in some tangible way.

    Back to Grooveshark: interestingly, we the two different types of programmers working side-by-side: I’m the primary developer handling all of the backend API stuff, and our lead Flex developer is someone who prefers abstraction and inheritance, doesn’t mind having tons of objects with single methods in them, etc. She’d have to be that way, or Flex would drive her completely nuts. I hate working on her code because I need to have 50 files open to see how one thing works, and even then I don’t get the whole picture because of all the things flex and flash are doing behind the scenes. Meanwhile, she hates having to read my code for the opposite reasons. We are both good programmers, or we wouldn’t have the awesome product that we have, but our styles are pretty incompatible. Since our code interacts in a very limited and well specified way, we are able to get along just fine, otherwise I don’t think we could work on a project together. ;)

  6. typo:
    I think that guys who dedicate ->completelytop methodology:


    I think that guys who dedicate completely to the down->top methodology are wrong, the typical sub-product of this “school” are these guys:

  7. I think another tendency for this style of programming may be a preference for imperative code, as OO’s goal is to abstract you from what is actually going on as much as possible. Whenever I use any system or library I don’t understand intimately it all feels terribly unstable.

    Ken Thompson talked about this in an interview: http://genius.cat-v.org/ken-thompson/interviews/unix-and-beyond

  8. I classify myself as depth-first… I want to know all the tidbits of the thing I am doing/using, but usually do so by tinkering with code first (use some kind of project to learn the basics of the language “with dirty hands”, but keep on reading the 101 course, the 201 course until the advanced, or whenever I want to stop).


  9. Hey Mike, I responded at length here: http://posterous.timocracy.com/simplicity-does-mean-simple-readable-methods. In short, as a partisan of the other camp, I think it breaks down to a couple points:

    You can follow down into the implementation from the top-down program, but there is no equivalent way to get the big picture from a bottom-up one.

    The increased cognitive load required to keep all the implementation in your brain becomes problematic as a program grows.

    This effect makes it hard for new people to grok the code and hit the ground running.

    Scanability of even medium sized code bases becomes very difficult.

    This also means coming back to your own code later requires relearning it, instead of just looking at what the descriptive methods say it does.

    Cyclomatic complexity.

    Single responsibility pattern


  10. Hi, Tim, thanks for the extensive reply. It seems, though, that I didn’t really get my point through to you. In support for Fowler-style code, you write that when a different style is used “The increased cognitive load required to keep all the implementation in your brain becomes problematic as a program grows. This effect makes it hard for new people to grok the code and hit the ground running.” But that is precisely what I experience with Fowler-style code. Your comment suggests that you are generalising your own response to the two styles and considering as though it were universal. It’s not.

  11. I am admitting I am biased, while *trying* to consider how that affects my objectivity. If you get a chance, watch, Rein’s presentation, I think a couple of the early examples really better showcase what I am talking about.

    Do you agree, that *scanability* is hurt by longer more complex methods, even if you disagree with readability?

    Any comment on the other points? (you did hit the most important one).

  12. I’m glad I’m not alone in my tendency to want to have a deeper understanding before working on something. Of course, I have to balance that with available time, and as I’ve gotten older, I’ve become much more willing to not understand everything down to the smallest detail. But I still like to do some upfront studying of unknown areas.

    As far as actual writing of code (assuming a new project), I think I work both ways. If there are things at the “bottom” I’ve identified as necessary and if they are well defined enough, I tend to tackle those first. The bottom things are often simple and small in scope which makes it easy to knock ’em out and get the ball rolling.

    Then I’ll pull back, go to the “top”, and start working from there until I hit something that requires more work on the bottom. I guess I actually pull the ends together and tie it up in the middle. I hope that doesn’t seem random. I think my approach has a lot to do with problem discovery. Trying to find the things I don’t know that I don’t know.

    For reading code, I’ll do a breadth first approach to understanding the code. Then I’ll dive deeper as necessary. I try to only go as deep as necessary. Usually I’m reading code during a debugging cycle.

    P.S. I’m also glad I’m not alone for not having committed the GOF book to memory. I actually *did* read it long ago, but I didn’t find anything new or particularly useful (it is very general). I merely nodded my head at all the patterns I recognized and thought back to when I had used them. Unbeknownst to me, there was a whole subculture who started using the pattern names as a shorthand way to discuss software development. I’ve been meaning to reread it with an eye on memorizing the pattern names, but that’s not exactly good times…

  13. I’ve never read the GoF, either. Does that make me a hack? Friends reference it, and I have a working understanding of the patterns I think I need, but I almost feel like the low ceremony of ruby makes some of the patterns so natural you almost don’t notice them, others unnecessary, and the rest trivial to implement, comparatively.

    Fyi, I added some examples for debate, in response on my blog, but I’ll just link straight to the gist here: http://gist.github.com/346989

    While the second method has got the nitty gritty you can see at once, the first is *scanable*, imo, and certainly provides more integration points for testing, DRYing, and reusing via different compositions of it’s sub-parts:

    def can_checkout_movie?
    account_is_current? &&
    available_rental_slots? &&

    def can_checkout_movie?
    (Date.todayaccount.last_membership_renewal_date) > vendor.membership_period &&
    movies.currently_checked_out < max_movies_rentable &&
    account.late_fees < (account.is_premium? ? vendor.max_late_fees_for_premium_account : vendor.max_late_fees_for_regular_account)

  14. As in, let’s drop the abstract and compare concrete and see if we really do like it that different and why. Please feel free to fork that gist and modify with what you think is a fairer comparison (because I probably wrote it based on a biased understanding of your position)

  15. Yes, thank you for writing this post. Awesomely done. I too avoid the long chains of def^H^H^Hdelegation that is present so often in code that conforms to the methods advocated/demanded by the cults of programming — that is, the multiple-one-true-ways that exist in our best-practices-laden egosystems.

    An interesting case study I love to revisit from time to time is the Enterprise programmer vs. Peter Norvig in their respective attempts to write a Sudoku solving program. The Enterprise guy is laudable in his adherence to best practices, TDD, scrumming himself, and leaving the door open for constant refucktoring, but P.N. actually understands the problem at hand. Not that P.N. is breadth first, no….. the central methodology to his methods is constraint propagation, and he writes in a “solve the problem first” way, and worries about Fowlering his code up later.

    As you say, both breadth and depth first programmers can work and have their places. I think the disaster comes when each is forced do work in an opposite fashion simply because “That guy with the beard said it was the right way to reduce dependencies”.

    A better rule would be “Hey, let’s figure out the way someone programs and use that to make something that actually works”.

  16. (Briefly because I have a deadline to meet tonight …)

    Tim, of course your decomposed_movie_example.rb is simpler and clearer than your implementation_movie_example.rb.

    But that’s not the question. I raise two questions that I think are more important (and more open to debate):

    1. Is decomposed_movie_example.rb still clearer once you admit that you also have to read and understand account_is_current? available_rental_slots? and late_fees_within_limit? as well as can_checkout_movie?? Possibly true, but not obviously true.

    2. Which version is clearer when you’re reading the code because you need to know the detail of the circumstances under which someone can check out a movie? Again, I’d say that the answer to this is not clear, and quite possibly depends on the habits, experience and apititude of the individuals involved.

    (Finally, I want to mention that either way, this is super-much clearer in Ruby than it would be if it had to be written in some scaffolding-rich language such as C++ or C#.)

  17. Mike, okay, another time, then. ;) I would like to say the only reason I replied in the first place is it seemed like a well reasoned and written post that I happened to disagree with, which creates a good interface for knowledge expanding debate.

    If those methods are well tested, and written by yourself, or a team member you trust, you don’t need to dive into them every time. Once you have maybe read down through the call chain once, can’t you just rely on those methods being what they say they are?

    Yes, when you want to see the details you have to check the sub-methods. My point is, I guess, don’t force me to *always* see the details by not breaking the method down in the first place. When needed I’ll look at them, the rest of the time I can skim over the high-level. Isn’t that always the drive behind the move from lower level to higher level languages?

    Also if you can’t trust your code/team, there are bigger problems, either way. If somebody is lieing in their intention revealing named methods, there need to be more/better tests and they need to have their code reviewed more throughly.

    Those methods are 100-fold easier to test, too, imo. That means I can have 3 more easy places to diagnose and catch the bug I am trying to track down.

    (Yeah, and if I did it in Java I’d have to make 6 classes in between to handle the delegation. I agree that can get out of hand.)

  18. Your site makes me hungry.
    Good read, to be honest didn’t finish it (see previous statement).

  19. Phillip Howell

    Jumping into the conversation between Mike and Tim… (mostly on the side of more, smaller methods):

    The ability to successfully work with a codebase of decomposed methods is directly dependent on your trust of the implementors of that codebase.

    As an example: Mike, you asked “Is decomposed_movie_example.rb still clearer once you admit that you also have to read and understand [a bunch of other methods]?”

    Well, if I trust the implementers of those methods, then (unless I’m working on one of the decomposed areas) I don’t need to know the details of those methods. I can look at ‘can_checkout_movie?’ and know what I’m going to get from it.

    The kicker is that this is just another level of the same kind of trust you display every single time you fire up your compiler or interpreter, or every time you use library code. (Or by not building your own processor or… you’ve read ‘Reflections on Trusting Trust?’)

    Do I need to be able to dig down past that level of abstraction if something’s going wrong (or if I need to change the rules about whether someone can check out a movie)? Of course. Just like I need to be able to look at and understand assembly when I’m debugging embedded C++ code — but I’m not going to write in assembly, because it isn’t scalable.

    What I hear when someone advocates a strong bottom-up approach with long(er), detailed methods that have fiddly logic is: they either don’t trust other developers working on the codebase, or they don’t trust themselves.

    (All of this is not to say that those longer methods aren’t *sometimes* the right thing to do even in an environment of trust. They just usually aren’t.)


  20. And, yes, Philip: you may be right that I should just trust other programmers more than I naturally tend to. I trust printf() and the Perl interpreter and the Linux kernel because they’re battle-tested by millions of other people as well as myself (and also because the interfaces to them are well documented); but perhaps I would win by extending that trust more readily into areas where it’s not (yet) been earned.

    (Side-comment: I love that Phillip can describe himself as “mostly on the side of more, smaller methods”. That the conversation is taking place with that spirit of moderation is an indication that we’re all Getting It. That makes me optimistic that we might even come out of this with some clearer ideas — which is what I went into it for. Keep it up, please!)

  21. Do you use a folding text editor, or not?

  22. I’m not sure what you mean by a “folding” text editor. I use GNU Emacs, which is happy to narrow its buffers down in whatever way you specify — is that it?

  23. Michael Kohne

    Folding editors can hide segments of code, usually based on the language settings. So if you were dealing with C++, a folding editor could hide everything between a particular pair of braces, making it easier to see the overall structure of the code.

  24. Resulting in neither being able to see the details of what is going on nor a higher level descriptively named version. ;)

  25. I must be a breadth-first person by nature, as I often find myself stripping out chunks of code from a long complex method and placing the chunks in their own small, single-responsibility methods. It really does help me to get the bigger picture – what the intention of the original programmer was. It also helps when finding bugs :)

    There are times when I deliberately break this habit, but almost always for performance reasons. For example, in my Java chess engine, the main search method is over 150 lines long. I deal with the added complexity by visually structuring the code very carefully, and commenting everything much more than I ordinarily would, so I don’t forget in 6 months time why I wrote it that long convoluted way in the first place…

    That said, my own view is that for problems of any reasonable complexity, top-down is the generally the way to go. I think individuals who can hold all the various bits of a complex process expressed in one solid 300-line block of code are much rarer than those who can follow the outline of the same process in 50 lines. The latter group can always drill down into the sub-functions as and when they need to understand the implementation details. And certainly, the top-down approach aligns better with our information processing capabilities, from the perspective of chunking theory

  26. bey0ndy0nder

    Ha that’s how I play video games also–depth-first.

  27. Sticking to one of breadth-first and depth-first should better be left for the computers. Humans are generally smarter to follow a mix of breadth-first and depth-first, and top-down and bottoms-up as per need of the specific problem.

  28. Michael: thanks to hideshow.el, Emacs is indeed a folding text editor. It’s amazing how often I forget it’s there, then jump on it again with cries of joy… and then forget about it. I suspect hiding is something that’s mostly useful only if you like huge functions, and I’m a tiny-functions man.

  29. it’s very, very unusual that, having started to read a novel, I don’t finish it. (The last such was Mervyn Peake’s Titus Groan, which a lot of people love, but which, so far as I can tell, doesn’t seem to have any actual story.)

    Amen, brother. And people complain that Tolkien has too much description and not enough Stuff Happening.

  30. People here seem to believe that ‘bottom up’ means ‘holding the entire program in your head at the same time’. This is not true. I’m a bottom-up programmer, I should know. Bottom-up just means you write the low-level functions before the high-level functions. Top-down means you write the high-level functions before the low-level functions. In neither case do you have to keep every detail of the entire program in your head at the same time.

    It also has nothing to do with whether you use a few long functions or many small functions. My personal philosophy is to keep programs as simple as possible, and in my view that means not creating functions unless the logic of the problem I’m trying to solve requires it. (“Requires” in the sense that that’s the approach that will lead to the smallest, simplest program). However, some people use a different definition of “complex” and consider long functions to be inherently more complex than small ones, and therefore insist on breaking up long functions. I disagree but I suppose it’s a matter of opinion and again it has nothing to do with bottom-up vs top-down, you can write a program either way either way.

  31. Wayne Radinsky, of course you’re right that bottom-up doesn’t mean that you have to hold the whole program in your head at once. It does tend to mean, though, that you end up reading the whole program; whereas someone who reads top-down and breadth-first might break off reading much earlier, feeling that they already know all they need to know, and can trust the lower-level code to do what is suggested by the names of its classes and methods.

    Whether it’s better to finish code-reading quickly, or to take longer over it and end up having seen it all, is left as an exercise to the reader.

  32. Wayne, some of us were just choosing to use Mike’s semantic choices so we could get on to discussing the issue, even if we thought there might be some orthogonality lost in doing so. ;)

  33. There’s also depth-first vs. breadth-first programming, which I wrote about here:


    Also, the book Code Complete (“How Long Should A Routine Be”, page 175) cites six studies that all find that longer functions correlate with lower bug rates, are cheaper to develop, and are easier to understand. I don’t have the book, I’m reporting what I read elsewhere online. Does anyone here have it?

  34. http://github.com/raganwald/homoiconic/blob/master/2010/03/significant_whitespace.md?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+raganwald+%28homoiconic%29&utm_content=Google+Feedfetcher#readme
    “It may seem like perhaps I should break this up into smaller methods to make it easier to understand. I am thinking that we want to break long functions up into short functions because our languages don’t give us a good way to express a verbose idea clearly in a single function. We are not talking about 20+ lines of conditional execution here, we are talking about manipulating something that can naturally be expressed as a tree. Why not express the idea in code that looks like a tree?”

    An interesting comment on a particular situation that may would reach for smaller functions.

  35. Lawrence, that is a very interesting statistic. As part of my ongoing reinvigoration drive, I ordered a second-hand copy of Code Complete a couple of days ago, so I’ll check page 175 as soon as it arrives!

    Nice essay, by the way — and a twist in the tale that I’d not anticipated.

  36. I’m definitely a Fowler-ite. Part of it, I think, has much to do with how you decompose a problem. Think about the different strategies for solving a large puzzle: some people build the border and gradually connect pieces to the border until they finish. Others take entities from the puzzle, build them separately, and connect them all later.

    I find it much easier to decompose a problem in terms of its component parts. As an example, if I know I’m going to be doing some basic linear algebra, I will almost immediately work up basic Matrix and Vector classes. Yes, I could use 1-d and 2-d arrays, but why duplicate the effort when I can pick out these entities from the start?

    Now, I think you’re probably not against that, but there is a happy medium between picking out obvious entities into classes, and making every statement in a method a call to some other class. I think Fowler definitely goes way too far in his class count (which may be why he loves UML so much, to keep all the classes straight). But I think it’s ok to keep the methods as short as possible, without losing meaning. For me, the boundary would be “would someone know what this class is supposed to do without having to look at it?” A secondary boundary would be “am I actually going to reuse this in a generic sense?” I think these questions can prevent a lot of over-classing.

  37. If you do decide to read about Design Patterns, I recommend O’Reilly’s “Head First Design Patterns” book. The Head First book does a very nice job explaining when each of the patterns is appropriate to use, and explains the finer distinctions of some of the similar-looking patterns.

  38. You’ve seen how quick some people play Quake 1, haven’t you? :)

  39. Yes, I am in awe of the people who made Quake Done Quick With A Vengeance. I’ve watched it right through half a dozen times, mostly with the boys, and there are still places where it makes me laugh with its sheer audacity — not least, the grenade-jump up onto the bridge that cuts out 80% of E1M4 (The Grisly Grotto).

    Here’s the best video I can find of it on YouTube — sorry it’s so dark.

  40. Henrik Warne

    In Code Complete, ”How Long Can a Routine Be” is actually on page 173. I wouldn’t say the text supports the notion that longer routines are better. Instead, it says that you should not be afraid of using longer routines (up to 100 – 200 lines) when necessary. Here’s a quote: “Decades of evidence say that routines of such length are no more error prone than shorter routines”.

    Chapter 7.1 “Valid Reasons to Create a Routine” is also interesting to read. The first reason listed is “Introduce an intermediate, understandable abstraction. Putting a section of code into a well-named routine is one of the best ways to document its purpose”.

    I like to use many short routines with descriptive names, but I also totally understand Mike’s desire to understand what the code does, not just relying on a name. However, as both Tim Conner and Phillip Howell have pointed out, there is no conflict between these two goals. The first time I read through new code, I will usually go down into every method to see what it does, and to convince myself that it works correctly. But once I have done that, I am perfectly happy to trust the name of the method (as a short hand for what it does). And if in doubt, I can just read it again.

    I also think that the ease with which you can navigate the code base influences the coding style. I am currently using Java and Intellij IDEA, and prior to that I was using C++ and XEmacs for 7 years. When using C++ and XEmacs there were fewer classes and longer methods, because it was comparatively harder to navigate around. In Intellij IDEA it is very easy and quick to navigate through method call chains, so there is less of a reason to create larger methods.

    Regarding top-down or bottom-up: in the book “Lean Software Development” by Mary and Tom Poppendieck, there is a diagram on page 18 that shows how experienced designers work out a solution to an ill-defined design problem by repeatedly moving between high level and low level design activities. In other words, it is not either-or, it is both!

    Also, my vote for the “Head First Design Patterns” book – much easier to read than the GoF one.

  41. Thanks, Henrik, lots of interesting thoughts there. As it happens, my copy of Code Complete has just arrived — as in, literally, in the last ten minutes — and of course I turned straight to the section How long can a routine be? (which in my book is on page 93). Very interesting stuff there, which I will blog about separately.

    I wonder whether you might just have a point about IDEs. I have to admit I hate ’em, with their take-over-the-world attitude, and (OK, paradoxically) feel much happier in an Emacs buffer. But maybe highly Fowlerised code is where they come into their own.

    Interesting that two commenters have now recommended Head First over GoF. I think I’ll still go with the original, though — it’s part of the same impulse that makes me want to fully understand all the code: I want to see how Patterns came into the world, not just how they’re used now.

  42. Andrew Raybould

    Mike, when you read depth-first, do you follow the system calls into the kernel and down through the device drivers? Do you need to see how the interpreter handles each statement or read the assembler your compiler generates? Do you need to read the JVM source to use Java? No, because these all have well-defined and accessible semantics, and computing as we know it would be impossible if this were not so.

    Whenever we write an application program, we are building on deep layers of abstraction (especially if we consider the multiple abstractions of the hardware layers), and without abstraction and separation-of-concerns, we would never get anything done. The goal of Fowler et al. is to extend this principle into applications, so that we can build large systems without getting lost in them (not having read Fowler’s book, I cannot say to what extent he has succeeded.)

    The key is ‘well-defined semantics’, and there are a number of myths around this issue that are always being invoked, but which only create confusion and distraction:

    It is not an issue of method size, number of methods and depth of call chains. It is only loosely related to coupling, cohesion and cyclomatic complexity. It does not favor any particular side in the OO/functional/procedural tussle. It is largely independent of these concerns because they are implementation issues, and semantics are primarily about intent: what is this code for?

    One of the most pernicious of these myths is that of ‘self-documenting code’. In general, you cannot explain the semantics of a body of code simply through naming the elements of its interface. This is an issue that I have to deal with at least once a week – here’s a method for X, can I call it multiple times, and what happens if i do? What’s the protocol for Y? What constraints must I observe in a concurrent environment? As developers, don’t we often face design choices where there are multiple, equally valid choices about how to do things, where the only thing that matters is that once we make a choice, everything else must be consistent with that choice? In cases like these, the consistency requirement is not a property of any single method, it is a constraint on the work as a whole.

    Furthermore, as the original developer wrote each statement of a program, she had in mind some definite purpose for it, as another step towards satisfying a goal (in fact, there is always a hierarchy of goals, coming primarily from the requirements and secondarily from design choices about how to satisfy those requirements). We also hope she also had a reason for believing that the statement was a good design choice in that context. As a reader, you need to understand the purpose of that statement, and you cannot really do that until you know what the author’s goals were. Given the limits on what you can explain through identifier names, an otherwise undocumented program is a puzzle you have to solve, and I submit that this is what drives depth-first reading: you often cannot move on to the next statement until you have found out the implications of the current one, and that can only be done by digging down through the function calls.

    To anyone who still believes you can adequately describe the semantics of a module through the names of its interface, consider a particular module: the compiler or interpreter of the first programming language you learned (or the functional language you are starting on, Mike). Would a list of its keywords and the names of its built-in and standard functions be sufficient for you to be able to learn how to use it?

    My conclusion is that simplicity can only be achieved where you have clear, consistent semantics, and these need to be made known to its users. One implication is that there is no simplicity without documentation (this is not an original idea: for Knuth’s view on it, look up ‘literate programming’.) The other is that until the preponderance of developers look at software in this light, we won’t see much simplicity.

  43. Andrew, your thoughtful and insightful comment deserves a proper response, and I’m going to give it one in a forthcoming blog entry rather than letting it get buried down here. You raise good and important points.

  44. I will have to find a copy of Code Complete and read what he says. As I mentioned before my approach is to create functions as the program logic requires it, and not to try to conform to any notion of how long or short any function ought to be. I feel this has improved the quality of my code. I’m interested in knowing how my intuition fits (or not) with Steve McConnell’s detailed research.

  45. Great insight on breadth- vs. depth-first personality. After some thought I realized that a depth-first person may make a good programmer (or a specialist), but an architect, a manager, or a leader has to belong to the breadth-first type.

    At some point you just can’t understand and control every tiny piece in detail. You need to trust others and communicate.

  46. Man, that was insightful! I’m a depth first person, spot on!

  47. Pingback: Frameworks and leaky abstractions « The Reinvigorated Programmer

  48. Pingback: Saying goodbye to Twitter | The Reinvigorated Programmer

  49. > At some point you just can’t understand and control every tiny piece in detail. You need to trust others and communicate.

    If my colleagues do not communicate with me, in the form of e.g. comments telling me about the goals, requirements, guarantees and invariants of a method, I need to read it and find out what it does to reconstruct that information.

    The reconstruction is always at best partial: while I might be able to figure out some set of pre- and postconditions, the purpose is something that exists in our minds (and maybe the issue tracker) but not the code. Furthermore, any global or system invariants (don’t be multi-threaded here, make sure you temporarily monopolize that resource, here’s where you need transactions) are likely to be important for correctness and very un-evident from the code.

    Furthermore, there’s the issue of future versions. Let us consider functions “may_rent_movie” which calls “is_account_current”, and let’s say both are bug-free in version n. If we change both the specification and implementation (in an externally visible way) of “is_account_current” in version n+1, is “may_rent_movie” still correct? Does it depend on *how* the behavior of “is_account_current” changed?

    Every time you move some block of code into a new method you raise these questions. Do you habitually answer them? If so, where? Even if you do, more methods means more questions.

    To show that the answers are not trivial: the specification of “may_rent_movie” may very well be that it should do the AND of three bits of business logic, whatever those are at the moment, and it is correct if it calls the three helpers no matter what they do.

    However, if you call a function named “map” (with a function f:X->bool and a list-of-X) which does the obvious thing, but one day some freak replaces the implementation and specification of “map” to make it act like filter, most likely your code no longer does what you want.

    Will the name of a method tell you whether it does policy (the first kind) or mechanism (the second kind)? Will the implementation? Will the (often ninja) comments?

    In addition, one benefit of inlining your helper methods instead of factoring them out is that you know more of the context around each line of code. For example, the shape of the control flow graph is not evident from the source code at function boundaries, but it is source-evident inside functions.

    One axis along which you can divide code is policy vs. mechanism, which is more or less the same as domain concepts (and their complexity) vs. implementation concepts (and its complexity).

    Another dimension is calculation vs. state vs. interaction. I apologize for the suboptimal names: calculation is pure functions, interaction is any effect on your program or visible from outside your program: talking to the server, the database, the file system, other systems, and so on, plus receiving UI events. State is any (side) effect only visible from inside your program, most notably (only?) mutating in-memory data, but state may change future responses to interaction.

    Since pure functions often compose reasonably well, it makes sense to me to make them small and general, such that you can express larger constructs by combining them.

    Since interactions depend on context, in particular previous interactions, it makes sense to me to bring as much context together as reasonable, since each step is best understood by reference to the other steps. Of course, put “try_rent_video” and “show_first_20_rentable_videos” in different methods. Maybe if they have some common substeps you factor them out. Perhaps you should mark each method (by containing module name?) whether it’s a small sub-step or some top-level business/domain logic step.

    I have not made up my mind when it comes to state. Where two chunks of state are orthogonal, meaning the whole system state is a cartesian product of them, it’s probably fine to track those bits of state independently. Where states are mutually dependent you probably want to bring them together if it doesn’t contort your code too much.

    Of the views stated so far, I think mine are closest to Andrew’s.

  50. Thanks, Jonas, lots of good insights there!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.