All you need to know about version numbers in one page

A colleague asked me a couple of days ago: “So we roll version numbers forward only with breaking changes, right?”

Well, the best approch for any sane project in 2017 is to follow Semantic Versioning. That is not a long document to read, but here is a summary. In a nutshell, version numbers have three facets, major.minor.patch.

  • If your new release breaks something that used to work, increment major.
  • If your release adds new functionality that clients might want to rely on, increment minor.
  • If your release only fixes a bug, increment patch.

Then dependencies of the form “^3.4.2” (for example, in package.json for a JavaScript project) mean “that version, or anything backwards-compatible with it”. Which means the same major version number (3 in this case) and the same or better minor number (4 or higher); or, if the minor version is the same, then the same or better patch level (2 or higher).

This is an excellent, simple and battle-proven system.

However.

It’s not used as universally as it should be, due to of widespread confusion between software version numbers (which are part of a project’s engineering) and marketing version numbers — where going to version 1.0 is a Whole Big Deal. As a result, some engineers are scared of moving their packages to a non-zero major number. And as a result of that, you sometimes see projects releasing both breaking and non-breaking changes within the 0.x series.

Folks: don’t do that. Use semantic versioning. No ifs, no buts. Let the marketing people bundle the whole thing up periodically and call it “Release 3, ‘Kaylee'” if they want to. Doesn’t matter to us.

Advertisements

20 responses to “All you need to know about version numbers in one page

  1. I think that a lot of projects have stayed at 0.x releases for way too long, but I think there’s space for staying at 0.x and having releases where the programmers didn’t have to worry about breaking something but can release copies to people willing to work with beta software.

    Also, as https://xkcd.com/1172/ points out, there’s no such thing as a non-breaking release. Adding an method to a class could break anything who inherited from that class. Adding almost anything to a library package or header could conflict namewise with current code. Speeding up a function could break cryptographic code that depended on that function running at constant speed independent of the input or cause race conditions in (admittedly already buggy) threaded code. Automation code may depend on the exact setting of the menu.

  2. The XKCD is funny (as it often is), but of course not really true. A release that adds a new function to an API but doesn’t change or remove any of the existing ones is non-breaking in every sense that matters. (A developer who actively depended on the unavailability of a facility gets exactly what she deserves.) I don’t at all see how adding a method to a class can break anything — can you explain?

    On 0.x releases: I agree there is a place for them, and indeed the SemVer document explicitly allows for them. But as soon as someone else is depending on your code, you do them a disservice to stay at 0.x; especially since nine times out of ten it’s done from sheer superstition — fear of the Magic Number 1.0.

  3. class A {def plus (a : A) = ???}
    class B extends A {def minus (a : A) = ???} 
    

    is valid compiling Scala code.

    class A {def plus (a : A) = ???; def minus (a : A) = ???}
    class B extends A {def minus (a : A) = ???} 
    

    doesn’t compile. It’s a trivial fix–adding “override” to the code– but if some method of A depends on minus, and ends up calling B’s minus, which works differently, it can cause bugs; if A’s new plus calls minus and B’s minus calls A’s plus, an infinite loop can occur.

    You can keep code under wraps until it’s stable, but that doesn’t help people who need it now and can give useful feedback. You’re basically saying there’s no case where you would make code available and yet want the freedom to make incompatible changes whenever needed. I think a lot of open source software not only has that as part of their lifestyle, but can benefit from having people use it before long-term compatibility locks things in stone. (E.g. & and && are the same precedence in C because when they were split from each other, they felt they couldn’t afford to make the change that much bigger for the 60+ users of C, thus making C suboptimal for the next few million users.)

  4. Interesting example!

    You’re basically saying there’s no case where you would make code available and yet want the freedom to make incompatible changes whenever needed.

    Not at all! You are free to make incompatible changes whenever you wish. The only obligation imposed by SemVer is that you signal that you’ve done so, by incrementing the major version number.

  5. The main problem with semantic versioning is backwards compatibility, “continuous” delivery and numbers. I’m using quotes there, because REAL CD is very difficult, but even with lax definitions of them, you’ll have the problem.

    The problem is that, once your project is stable, you should never change the major version, even releasing a lot of times. And that’s fine, but at some point keeping version 2.157.3 is silly. The Linux kernel had this problem, moving from version 2.6.39 to 3.0.0 (and to 4.0.0 after 3.19)

    I think that, while semantic versioning makes sense and as a general rule should be used, also keeping the numbers stale while there’s development done is silly, and sometimes an extra bump is fine.

    And I 100% agree that using external marketing names is s great idea to detach both concepts.

  6. I disagree on two counts, Jaime.

    First, there’s no problem with a stable project moving to a new major version. All that needs to be done to make this safe is to ensure that automatic upgrades never cross a major-version transition — only manual upgrades. (In practice, lots of software maintains packages for several different major versions, so they can co-exist — for example, our own YAZ Toolkit comes in Debian packages libyaz4 and libyaz5 so you can choose which version or versions you want.)

    Second, the idea that there’s anything wrong with a version number like 2.157.3 is just superstition. Version numbers are for machines, not humans. (Except marketing version numbers, of course.)

  7. thomasrushton

    2019?

  8. Ha! What a strange mistake to have made. Thanks for spotting it, thomasrushton — now fixed.

  9. Version numbers are not for machines. Only library versions actually matter, and they may or may not match the package version number. A kernel could number itself with a single integer or simply offer feature tags and no internal numbering system. Without parsing the human readable –version output, there’s no way to tell what version of cat is installed.

    In the case of Windows, or Debian or any sufficiently large system, an external marketing version identifier can exist separate from the internal. But in many of those cases, there is no internal; Debian, for the most part, is merely a pile of package version numbers for machines or one of “oldstable”, “stable”, “testing” or “unstable”; and whatever the label is on your Intel CPU, you can check features and MHz but no version number at the machine level. A Windows or MacOS internal number is more useful than a Linux distribution internal number could be, given the flexibility in configuring most Linux distributions.

    For other programs, all multiple versions would do would confuse people. I’m running Scala 2.12, and as that’s currently my primary language, I keep up with the mailing lists. All a difference between the marketing versions and external versions would do is confuse me, or confuse other people when I try to explain what’s going on in Scalaland.

  10. David, I can’t make out your comment at all. You claim “version numbers are not for machines”; but SemVer is explicitly about making version number that do convey information for machines. You claim software could could number itself with a single integer, but that gives no way to communicate the distinction between breaking and non-breaking changes.

  11. A program doesn’t need to know the difference between a breaking change or non-breaking change going backward; it just needs to know the first version it will run on, which is likely to be a non-breaking change or even a bugfix. Going forward, it has no idea when a breaking change matters to it; generally, if it can run, it should bull ahead and let the user decide whether it’s okay. That sound doesn’t work on Lotus 1-2-3 is likely irrelevant to the user that just wants to retrieve their data, and may not matter to someone who just wants to play the old copy of Pool of Radiance.

    Major operating system kernels don’t have breaking changes that need to be signaled through major versions. Your Linux a.out program may run on the most recent version, or may not run on a much older version that dropped a.out support in compiling. A 16-bit Windows program written for version 3.10 in 1992 stopped running on version NT 10.0, released in 2015. I’d like to see the betting pool on that in 1992. A 32-bit program written for NT 3.10 in 1993 still runs on NT 10.0; we could have a betting pool on when that support will run out, but even 24 years later I’m not sure any of us could hit the right number.

    Programs don’t use Linux distribution version numbers; they don’t use Intel CPU version numbers. They don’t use the “cat” version number, and would have to extract it from text written from humans. And the main cases where programs use version numbers, they use the same version numbers humans use. The main counterexample in the open source world is OpenJDK which internally is 1.8.0 and externally is 8, which is a marketing decision that other open source programming languages are not going to make, and is basically just dropping the 1 and saying they never make breaking changes. Trying to make up some internal number that differs from the version identifier people who aren’t developers of the program use, in the open source world, is basically a waste of time.

    Note that even Windows, with all sorts of confusing external versions, jumped from 6.3 to 10.0 internally with the release of Windows 10. But as of Windows 8.1 (internally 6.3), programs must declare the version of Windows they run on or get told the version number is 6.2, and even then I believe they get the version number they declared for. It’s a hack to work around programs that care about the version number.

  12. David, you seem to be considering only the most primitive version-numbering system (incrementing single facet) then complaining that the version-numbering system doesn’t allow client software to express dependencies well. This seems circular to me.

    Here are a few circumstances where faceted version numbers are not merely a nice academic idea but battle-tested and relied on.

    * Installation-package version numbers in Linux distributions. Metaproxy v1.11.7-1 relies on libxml2 (>= 2.7.4) and libyaz5 (>= 5.0.0).
    * Node package version numbers in NPM configuration. React-router v4.0.0 relies on React (>= 15.4.2).
    * Node package version numbers in POM files. Mod-users v5.0.0-SNAPSHOT relies on domain-models-runtime (>= 10.0.1).

    In the case of a major and very widely used distribution such as Windows, some care is rightly taken to provide even more backwards compatibility than the version-number promises, where this can be done. It would have been a legitimate engineering decision to say that Windows 3 binaries need not be supported on Windows 95, let alone NT 10, but human (and commercial) priorities rightly came into play as well.

  13. Which is better for the machine:
    Metaproxy v1.11.7-1 relies on libxml2 (>= 2.7.4) and libyaz5 (>= 5.0.0) or
    Metaproxy 11445 relies on libxml2 (>= 5883) and libyaz5 (>= 1)?

    The latter is trivial; it involves parsing and comparing integers. The first is relatively complex, and even more so once you start tossing stuff like “-SNAPSHOT” into the mix. Thus the three part versioning system is useful for humans, not machines.

    https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Version shows how complex this can get in real life; I think writing code that could compare two versions in that format without error would be much harder than a correct binary sort. Most of that complexity is not for the machine; it’s about letting the Debian version reflect the upstream version for the end user’s sake, who needs to be able to know that it is version 2.157.3 of the kernel, which is why whether or not 2.157.3 works for people matters.

  14. The former is better — obviously — since it allows the machine to make a decision about which version are and are not compatible. If it happens that we have version 2.8.1 of libxml2 available, that’s fine; whereas if we only have version 3.1.4, that is incompatible. Whereas in the second scenario, if we have version 6013 of libxml2, there is no way to tell whether that is compatible with the requirements of metaproxy. I don’t understand how this is even a question.

    And really — if your build software is so dumb that it can’t handle a faceted comparison, then you are likely to have much worse problems elsewhere in its code.

  15. That’s not what it says, at least not in Debian and Debian derivatives. 3.1.4 >= 2.7.4, and apt will happily install metaproxy if libxml2 version 3.1.4 is installed and metaproxy depends on libxml2 (>= 2.7.4). Not to mention that library packages in Debian may not have breaking changes without changing the soname and thus package name.

    If it works differently in Red Hat, that’s fine, but my experience is as a former Debian developer.

  16. It is true that Debian muffs this, which is why one often finds the workaround of Debian package names with the major version number baked right in — as in the examples of libxml2 and libyaz5 that I mentioned above. All this shows is that the Debian developers didn’t properly understand the principles that are now known as semantic versioning when they designed their system. Doesn’t it seem flawed to you to say “Look, Debian doesn’t implement semantic versioning, that proves that semantic versioning doesn’t work”.

  17. I’m not saying that semantic versioning doesn’t work, I’m saying that it’s for the humans. And I don’t know why you’re saying Debian muffs this; I don’t see anything in the Semantic Versioning document that says anything about this.

    I do not appreciate any system that says 3.1.4 >= 2.7.4 is false. It is completely counter-intuitive. Rule #11 of the Semantic Versioning document you linked to does says that 3.1.4 > 2.7.4. And again, I go back to Windows, which completely crippled the ability of programs to tell what version of Windows they were running on because otherwise they’d assume that they couldn’t run on 10.0.

    Debian doesn’t let you have the same package installed with multiple versions. This means that except for bugs and rare willful conflicts, no two packages installed will try to install the same file and installed packages can be identified unambiguously by their names. That also means, however, that if you want libxml1 and libxml2, or python2 and python3 installed at the same time, and you’ll need that in many cases, the packages need to have different names. This has little to do with semantic versioning, and everything to do with a design decision.

  18. I don’t know why you’re saying Debian muffs this; I don’t see anything in the Semantic Versioning document that says anything about this.

    “Backwards incompatible API changes increment the major version.”

    I do not appreciate any system that says 3.1.4 >= 2.7.4 is false.

    It is false, when (as here) the symbol “>=” is being used to mean “is backwards compatible with” rather then “is numerically larger than” — which is of course the case here. (You could argue that “>=” is not a good symbol to use for this. The developers of NPM evidently agreed, which is why they went with “^” instead.)

  19. Hi Mike. Long-time reader, rare commenter (maybe not since the selection sort challenge).

    I was a bit puzzled by this post so I re-read the semantic versioning docs.
    I think you are forgetting, or choosing to ignore, the parts where they talk about the clear distinction between pre-1.0 and post-1.0 versions:
    http://semver.org/#spec-item-4
    http://semver.org/#spec-item-5

    To quote: “Major version zero (0.y.z) is for initial development. Anything may change at any time. The public API should not be considered stable. … Version 1.0.0 defines the public API. The way in which the version number is incremented after this release is dependent on this public API and how it changes.”

    This is quite clear: the rules you describe apply *only* starting at 1.0, and not before then.

    So the meaning of 1.0 is not just marketing at all. It is very explicitly declared in semver itself as being the first time the public API is considered stable. And that, I think, is why so many projects linger at 0.x: reluctance to think hard about breaking compatibility, fear of lost agility, fear of commitment.

  20. Hi, slinkp, glad you stayed around :-)

    You are dead right that semantic versioning leaves a backdown in the 0.x space. (I did mention this in the article.) The problem is that this provides lazy programmers with an excuse to just not bother doing SemVer, and stay on v0.x forever.

    Going to v1.x doesn’t in any way impede your project’s ability to make breaking changes; it just means you’re committing to being explicit and honest about it from that point onwards.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s