A deep dive into the strange world of Python versioning

17 Feb 2024
15 min read
Tags:
dev,
explanations,
python

Have you ever looked at version numbers for software and thought, "What the hell were they thinking? I just want to know which version of a software I have and to which version I should update! This gibberish means nothing to me!". I mean, how hard can it be right? Well, a lot harder than you'd think.

Recently at work we had some real headaches with versioning. So much trouble, in fact, that I dived into the actual language specification for Python versions, called PEP 440. After having figured all of it out, I thought I might as well write it down, since it's a lot more subtle than you'd expect. This blog post is the post I had hoped to find while dealing with this problem. I'll give a bit more context where necessary, but I also won't cover every corner case. The PEP is surprisingly readable, even though it is the coding equivalent of legalese. So, if you want all the details, I'd encourage you to have a look at it yourself. For those who are content with the broad strokes, I'll try to give a summary here of the main points. Let's get started!

A primer on semantic versioning

Before we get into python versioning specifically, we'll have to cover some basics. The python versioning specification is heavily based on semantic versioning, so I'll give an intro to that first. Instead of just telling you the rules, I'll take a page out of 3blue1brown's book (one of my favourite youtubers) and see if we can discover it ourselves.

Say for the sake of argument that we have a software that we release updates for regularly. Because people want to know which version they have, we start by calling our first version, version 1 and every time we release an update we increment that number. So the next update becomes version 2, version 3, etc. etc.

However, now we have a problem. People want to know, will version 2 be able to read documents made by old versions? Or is version 2 backwards compatible with version 1? And if we then release version 3, we get even more questions about which versions are still usable. Boy, this is a big hassle already, isn't it? Wouldn't it be amazing if you could tell whether your documents are still usable just by looking at the version number? Enter Semantic Versioning (stage left), or SemVer for short.

A semantic version is a way of numbering versions so that they give information about which versions are compatible with each other. So a semantic version has three components, which are in order: the major version, the minor version, and the patch number, separated by periods. So for example 1.2.3 means, Major version 1, minor version 2 and patch number 3. Sometimes you'll see it written as v1.2.3 but that means the same.

Any time you release a new version that can't do everything the previous version could, you increment the major version number. Any time you release something that can do everything the previous version can, but can do new things as well, you increment the minor version number. Finally, if you release a version that has the exact features as the last, for example, because you only fixed a bug, then you increment the patch number. These are referred to as major releases, minor releases, and patch releases, respectively.

In semantic versioning, any time you increment a number, you reset all the numbers after it back to 0. For example, imagine the current version of your software is 1.2.3. In that case, the next major version would be 2.0.0. However, if instead you do a minor release, it would be 1.3.0. Finally, if you're just releasing the next patch, it would be 1.2.4.

As you can probably see by now, the versioning of software is very intimately intertwined with its lifecycle. So far we've only talked about versions of released software, but what happens before it's released? Someone has to build the stuff first, right? This is where what I'll call "suffix versions" enter the picture. They are to show various degrees of certainty that what you're looking at is going to be like that when it actually releases.

When a piece of software is still being built but not released to the public yet, it is usually called a dev version, usually with some identifier after it. So, for example, you could read a version number like 1.2.3.dev8 as "the 8th iteration of the software that is going to become version 1.2.3".

Then there are alpha and beta versions, written as 1.2.3.alpha4 and 1.2.3.beta5. If you're ever heard of a company of product having a "closed alpha" or a "public beta" this is what they are referring to. Alphas and betas are nearly complete versions of a product that are released to early adopters for testing and feedback.

There are also release candidates (RC for short), which would be written as 1.2.3.rc This is the "speak now or forever hold your peace" version. It is the final stage of testing before something goes out to the public. Depending on the size of the project, you might have all, multiple of all, or none of these "testing releases". Whatever works for you.

Finally, there are so-called post releases. These are updated versions of the software, fixing minor issues that arose (usually) during its release. An example would be if the official release of the software was signed with expired credentials.

Honourable mentions

Before I dive into a proper example, I think it's good to mention a few simple versioning schemes that you might see in the wild, but are much less popular.

One notable exception to the SemVer rules is when the major version is 0. Before version 1.0.0 is released, none of the rules, and therefore none of the guarantees, apply. At least, that's the theory. At the inception, major version 0 was a no-promises-use-at-your-own-risk version. However, many software products that have been used in production for years are still pre version 1, so the distinction is less harsh these days. There is even a fantastic satire page about this: https://0ver.org/

You also have Calendar Versioning or CalVer for short. This one is much simpler than SemVer. It has a few variations, but it comes down to "the version of software is the date it was released". The variations dictate exactly how much information you include. Most often you only see it formatted as YYYY.MM, YYYY.WW or YY.minor.patch, but they all come down to much the same.

Finally, there is also unary versioning which we've actually already seen. It's the first versioning system I used as an example. In unary versioning, you just have a single increasing version number that you increment any time you do any kind of release.

A fictional case study

That was probably fairly abstract, so let's make an example: a simple calculator app. When we start work on our calculator, it is versioned 0.1.0. dev0. Currently, the app is little more than a sketch on a bar napkin. We release new versions with every new feature. Adding a display is version 0.2.0. Including buttons that actually do something is version 0.3.0. Implementing addition and subtraction is version 0.4.0, shortly followed by multiplication in version 0.5.0. Finally, we add in a history in version 0.6.0, after which we go to our first alpha version: 1.0.0-alpha1. After thorough testing and feedback, we decide we're happy with the product and eventually release version 1.0.0. Our app is put out into the world and people love it.

However, after some time, people want to calculate roots as well, so we add that in. Calculations that people made before still work the same, but now they have more options, so this would be version 1.1.0. Some time after that, we release a few patches that improve performance (1.1.1) and fix a security problem (1.1.2).

For a while everything seems hunky dory, but eventually a problem surfaces. Back when we implemented addition and subtraction, we found out that dealing with negative numbers is really complicated. Since having a negative number of something makes no sense, we used a shortcut. We just decided that it's enough for our calculator to just return 0 as the answer any time you tried to subtract a larger number from a smaller number. So, in our calculator 5 - 8 = 5 - 13 = 8 - 100000 = 0. For our early customers, this worked fine, but now that our calculator has become so popular, accountants have started using it. They are very grumpy, because now they can't track debts.

Now we're at an impasse. Until now, you could check if a number was larger or equal than the other by just subtracting them and seeing if the answer was 0.

However, if we implement negative numbers, this no longer works! Eventually, we decided to make the accountants happy (our own accountant insisted). So we implement negative numbers. Since calculations that gave one answer in version 1 might now give a different answer, we release this as version 2.0.0.

Python versioning

Okay, I have a confession to make. Technically, we didn't have to cover anything to do with semantic versioning. The python version specification dictates nothing about the meaning of numbers or even how many there should be. The only requirement Python makes is that there is at least one number, the numbers are non-negative, and increasing across time.

So technically, some weird versioning schemes, such as the Tex versioning scheme, are actually Python compliant:

Since version 3.1, updates have been indicated by adding an extra digit at the end, so that the version number asymptotically approaches the number π. […] As of February 2021, the version number is 3.141592653. […] TeX developer Donald Knuth has stated that the "absolutely final change (to be made after [his] death)" will be to change the version number to π, at which point all remaining bugs will become permanent features.
— Knuth, Donald E. (December 1990) "The Future of TeX and Metafont"

So why then, you ask, did I make you go through all that theory? First of all, because I can (muahahaha). But more importantly, because it is a really solid foundation. Almost every serious versioning scheme used in software today is at least loosely based on semver and its syntax. How much it actually matters varies across communities. For example, communities such as JavaScript, or the Linux kernel (yeah, I know) play especially fast and loose with the rules. The precise distinction between major, minor, and patch versions isn't that critical. However, people can get very, very angry if your versioning doesn't communicate what they think it does. Just do yourself and your users a favour and use SemVer if you can, use CalVer if you can't.

Version specifiers

So we've covered how to compare different versions, but what if you need to tell someone which version you actually want? This is where version specifiers enter the fray. A version specifier is a way to describe a range of versions that are usable to you.

To quote PEP 440:

A version specifier comprises a series of version clauses, separated by commas.

What is a clause, you ask? Well, it's one of these 6:

~=: Compatible release clause
==: Version matching clause
!=: Version exclusion clause
<=, >=: Inclusive ordered comparison clause
<, >: Exclusive ordered comparison clause
===: Arbitrary equality clause.

What these clauses exactly do is where all the subtlety comes in (and where my mistakes came from). Python does also allow for unary versioning or CalVer, but as those are quite straight forward I won't spend more time on them.

Exactly how equality works is where things get more tricky. So I first want to cover that in more depth before I explain any of the other components. It's gonna get a little in the weeds, but bear with me because this is where our woes started.

As I am writing this, I am developing on python 3.9.18. Python 3.9 is the oldest version we still support. To make sure we don't use functionality that won't work for all our versions, I try to work with the most restrictive set. But here's the kicker, if I write, as a dependency python==3.9 what do you think I will get? Do I get 3.9.0 or 3.9.whatever or 3.9.latest? Yeah, that's what I'd thought too, but no. Sadly, it's not 3.9.18.

To understand why, we first have to talk about two more concepts: zero padding and prefix matching. In zero padding, you make the shorter version number longer by adding 0 segments until it's the same length as the other version number when comparing them. Prefix matching is the opposite in a way. If you add a wildcard segment like 1.2.*, you can compare versions by making them the same length, even though the documentation doesn't explain it in this way. When you encounter a wildcard segment, you do not consider any additional segments.

Let's look at a few examples again. Is 1.2 the same as 1.2.3? well no, because when you apply zero padding (the default if there is no wildcard) you get 1.2.0 which is not the same as 1.2.3. What about 1.3 and 1.1.8.4.32 (which is a valid python version)? This one is a bit more straightforward, but again we expand 1.3 to 1.3.0.0.0 which is not the same.

Conversely, is 1.2 the same as 1.2.*? yes, because they are the same until we encounter the wild card and prefix matching tells us that is all we have to consider. What about 1.* and 1.2.3? Again, yes. Because when we see the * we stop considering segments. Okay, how about 1.* and 1.0.0.alpha1-dev0? Yup, you got it; they are. Finally, what about 0.* and 1.2.*? Thankfully, this time we are safe, because even though both contain a wild card, they don't have any common prefix so they are not the same.

I hope you can see how zero padding and prefix matching represent the opposite sides of the permissiveness spectrum. With 0 padding you miss out on potentially useful stuff like patch releases. With prefix matching, you get a lot more stuff that you might not expect.

Now that we have a deeper understanding of how equality works, we can look at the others. So == we just discussed. When you use exclusion (!=) you do the same as with == but flip the decision you come to. What about ===? With arbitrary equality, the versions are just string matched and have to be exactly the same. This ascertains which version you will get. However, it has the unfortunate side effect that, if you specify 1.2.3.dev0 instead of 1.2.3.dev0 you will get told that the version does not exist.

Okay, 3 down, 3 to go. What about >? Well, here the same rules for zero padding and prefix matching apply, except that you're allowed to change your prefixes. So >0.8.* will actually match 0.9.0 as well as 1.2.3 or even 9999999.999999.99999.rc4 so beware. The same gotcha goes for < of course.

Finally, there is the "compatibility clause". This one is actually just a shorthand. ~= 2.2 actually just means >= 2.2, == 2.*. Now let's return to my original question. Which python version do you get if you specify python==3.9? You get Python 3.9.0. Okay, what about the "compatibility clause"? if you specify python~=3.9 what do you get? Why, it's python 3.12, of course, and not 3.9.18, which is why part of our CI testing actually did nothing for quite a few months.

I think the confusion is that when we say something like 3.9 as humans, what we usually mean is 3.9.* not 3.* or 3.9.0 as the computer interprets it. So actually that means that ==3.9.* and ~=3.9.0 are equivalent to each other but are not equivalent to ==3.9, neither of which are equivalent to ~=3.9. You see why I got confused?

So, if at the end of all this you're still confused, I get you. It feels much more complicated than it actually ought to be. But I've come away from this adventure with two rules of thumb:

if you're a developer: use SemVer if you can, use CalVer if you can't
if you're requesting software: use ==X.Y.* in 90% of cases it will be what you want.

I hope this has been useful to you! I'd be lying if I'd say that it wasn't an enormous pain to figure out, but hopefully by writing this I'll have spared you some of that pain. Take care, and good luck!