Thanks to SurveyMonkey for encouraging me to do the research this post is based on.
It is possible to work with Python quite a bit and not be aware of some of the subtler details of package management. Since Python is a popular “glue” language, one of its core strengths is integrating with libraries written in other languages: from database drivers written in C, numerical algorithms written in Fortran, to cryptographic algorithms written in Rust. In all these cases, one way to avoid error-prone and frustrating installation errors in the target environment is to distribute pre-built code. However, while source code can be made portable, making the build output portable is a lot more complicated.
composed of three peps,
two software repositories,
and support in pip,
addresses how to accomplish that.
These problems are hard,
and few other ecosystems solve them as well as Python.
The solution has many moving parts,
developed over the course of ten years.
this means that understanding all of those is not easy.
While this post cannot make it easy, it can at least make it easier, by making sure all the details are in one place.
Python packages come in two main forms:
Wheels are "pre-built" packages that are easier and faster to install. The name comes originally from a bad joke: the Monty Python Cheese Shop sketch, since PyPI used to be called "Cheese Shop" and cheese is sometimes sold in wheels. The name has been retconned for another bad joke, as a reference to the phrase "reinventing the wheel", allowing Python packaging talks to make cheap puns. For the kind of people who give packaging talks, or write explainers about packaging formats, these cheap jokes fill the void in what would otherwise be their soul.
Even for packages that include no native code, only pure Python, wheels have some advantages. They do not execute any potentially-fragile code on installation, and querying their dependencies can be done without a Python interpreter.
However, when packages do include native code the story is more complicated.
Let's start with the relatively straightforward part:
portable binary wheels for Linux are called
This is because it relies on the GNU C library,
and specific features of it.
There is another popular libc for Linux:
There is absolutely no attempt to be compatible with
musl-based Linux distributions,
the most famous among them is Alpine Linux.
However, most other distributions derive from either Debian (for example, Ubuntu) or from Fedora (CentOS, RHEL, and more). Those all use the GNU C library.
GNU C library
GNU libc has an official
"infinite backwards compatibility"
is compatible with
Aside: the 6 in libc6 does not refer to the version of the GNU C Library: Linux only moved to adopt the GNU C Library in libc6. The libc4 library was written from scratch, while libc5 combined code from GNU C Library version 1 and some bits from BSD C library. In libc6, Linux moved to rely on GNU C Library version 2.x, first released in January 1997. The GNU C Library is still, over twenty years later, on major version 2. We will ignore some nuances, and just treat all GNU C Library versions as 2.X.
The infinite compatibility policy means that binaries built against libc6 version 2.17, for example, are compatible with libc6 version 2.32.
The relevant PEP is dense but worth reading. "Portable" is a loaded word, and unpacking it is important. The specific meaning of "portable" is encoded in the auditwheel policy file. This file concedes the main point: portability is a spectrum.
When the manylinux project started, in 2016, the oldest security-supported open source distribution was CentOS: specifically, CentOS 5.11. It was released in 2014. However, because CentOS tracks RHEL, and RHEL is conservative, the GNU C library (glibc, from now on) it used was 2.5: a version released in 2006.
it was clear that the
compatibility level will be a moving target.
Because of that,
that compatibility level was named
the manylinux project moved to a more transparent naming scheme:
the date in which the relevant compatible CentOS release was first released.
the next compatibility target
(defined in 2018)
referencing CentOS 6.
In April 2019,
was defined as a compatibility tag,
referencing CentOS 7.
In the beginning of 2021, Red Hat, in a controversial move, changed the way CentOS works, effectively nullifying the value any future releases have as a way of specifying a minimum glibc version support.
The Python community decided to switch to a new scheme:
directly naming the version of glibc supported.
The first such tag,
was added in November 2020.
The next release of
moves all releases to glibc-based tags,
while keeping the original names as
It also adds a compatibility level
Libc compatibility and beyond
The compatibility level of a manylinux wheel is defined by the glibc symbols it links against. However, this is not the only compatibility manylinux wheels care about: this just puts them on a serial line from "most compatible" to "least compatible".
Each compatibility level also includes A list of allowed libraries to dynamically link against. Specific symbol versions and ABI flags that depend on both glibc and gcc.
many Python extensions include native code precisely because they need to link
against a C library.
As a concrete example,
wheel would not compile if the
libmysql headers are not installed,
and would not run if the
libmysql shared library
(of a version that matches the one the package was compiled against)
is not installed.
It would seem that portable binary wheels are only of limited utility if they do not support the main use case. However, the :code`auditwheel` tool includes one more twist: patching ELF.
Elves predate Tolkien's Middle-Earth. They appear in many Germanic and Nordic mythologies: sometimes as do-gooders, sometimes as evil-doers, but always associated with having powerful magic.
Our context is no less magical, but more modern. ELF ("Executable and Loader Format") is the format of executable and shared libraries in Linux, since libc5 (before that, Linux used the so-called a.out format).
When auditwheel is asked to repair a wheel for a specific platform version,
it checks for any shared libraries it links against that are not part of the
If it finds any,
it patches them directly into the module.
This means that post
the new ("repaired") wheel will not depend on any libraries outside the
These repaired binary wheels will include the requested manylinux tag and the patched modules. They can be uploaded to PyPI or other Python packaging repositories (such as DevPI).
For pip to install the correct wheels it needs to be up-to-date in order to self-check the OS and decide which manylinux tags are compatible.
Installing Binary Wheels
Because wheels tagged as
cannot be assumed on any platform other than the one they have been compiled for,
PyPI rejects those.
In order to upload a binary wheel for Linux to PyPI,
it has to be tagged with a manylinux tag.
It is possible to upload multiple manylinux wheels for a single package,
each with a different compatibility target.
When installing packages, pip will prefer to use a wheel, if available, instead of a source distribution. When pip checks the availability of a wheel, it will introspect the platform it is running it, and map it to the list of compatible manylinux distributions. Since the list is changing, it is possible that a newer pip will recognize more compatibilities than an older pip.
Once pip finds the list of manylinux tags compatible with its platform,
it will install the least-compatible wheel that is still compatible with the
it will prefer
manylinux2010 if both are compatible.
If there are no binary wheels available,
pip will fall back to installing from source.
As mentioned before,
installing from source,
at the very least,
requires a functional compiler and Python header files.
It might also have specific build-time dependencies, depending on the package.
So you want to create a universe
A story about looking for a universe, and finding a pi(e)
This is fine. You need not feel shame. Many want to create a universe. But it is good you are being careful. A universe with sentient beings is a big moral responsibility.
It is good to start with …read more
Virtual Buffet Line
Many people have written about the logistical challenges of food in a conference. You trade off not just, as Chris points out, expensive food versus terrible food, but also the challenges of serving the food to everyone at once.
One natural method of crowd control is the buffet line. People …read more
DRY is a Trade-Off
DRY, or Don't Repeat Yourself is frequently touted as a principle of software development. "Copy-pasta" is the derisive term applied to a violation of it, tying together the concept of copying code and pasta as description of software development bad practices (see also spaghetti code).
It is so uniformly reviled …read more
Fifty Shades of Ver
Computers work on binary code. If statements take one path: true, or false. For computers, bright lines and clear borders make sense.
Humans are more complicated. What's an adult? When are you happy? How mature are you? Humans have fuzzy feelings with no clear delination.
I was more responsible as …read more
I have written before about my Inbox Zero methodology. This is still what I practice, but there is a lot more that helps me.
The concept behind "Universal Binary" is that the only numbers that make sense asymptotically are zero, one, and infinity. Therefore, in order to prevent things from …read more
The Hardest Logic Puzzle Ever (In Python)
Hey, Back Off!
The choice in parameters for back-off configuration is important. It can be the difference between a barely noticable blip in service quality and an hours-long site outage. In order to explore the consequences of the choice, I wrote a little fictional ditty about a fictional website.
I hope you enjoy …read more
A Labyrinth of Lies
In the 1986 movie Labyrinth, a young girl (played by Jennifer Connelly) is faced with a dilemma. The adorable Jim Henson puppets explain to her that one guard always lies, and one guard always tells the truth. She needs to figure out which door leads to the castle at the …read more
Conditionally Logging Expensive Tasks
(I have shown this technique in my mailing list. If this kind of thing seems interesting, why not subscribe?)
Imagine you want to log something that is,
expensive to calculate.
in DEBUG mode,
you would like to count the classes of the objects in