Managing Dependencies

Sun 02 September 2018 by Moshe Zadka

(Thanks to Mark Rice for his helpful suggestions. Any mistakes or omissions that remain are my responsibility.)

Some Python projects are designed to be libraries, consumed by other projects. These are most of the things people consider "Python projects": for example, Twisted, Flask, and most other open source tools. However, things like mu are sometimes installed as an end-user artifact. More commonly, many web services are written as deployable Python applications. A good example is the issue tracking project trac.

Projects that are deployed must be deployed with their dependencies, and with the dependencies of those dependencies, and so forth. Moreover, at deployment time, a specific version must be deployed. If a project declares a dependency of flask>=1.0.1, for example, something needs to decide whether to deploy flask 1.0.1 or flask 1.0.2.

For clarity, in this text, we will refer to the declared compatibility statements in something like setup.py (e.g., flask>=1.0.1) as "intent" dependencies, since they document programmer intent. The specific dependencies that are eventually deployed will be referred as the "expressed" dependencies, since they are expressed in the actual deployed artifact (for example, a Docker image).

Usually, "intent" dependencies are defined in setup.py. This does not have to be the case, but it almost always is: since there is usually some "glue" code at the top, keeping everything together, it makes sense to treat it as a library -- albeit, one that sometimes is not uploaded to any package index.

When producing the deployed artifact, we need to decide on how to generate the expressed dependencies. There are two competing forces. One is the desire to be current: using the latest version of Django means getting all the latest bug fixes, and means getting fixes to future bugs will require moving less versions. The other is the desire to avoid changes: when deploying a small bug fix, changing all library versions to the newest ones might introduce a lot of change.

For this reason, most projects will check in the "artifact" (often called requirements.txt) into source control, produce actual deployed versions from that, and some procedure to update it.

A similar story can be told about the development dependencies, often defined as extra [dev] dependencies in setup.py, and resulting in a file dev-requirements.txt that is checked into source control. The pressures are a little different, and indeed, sometimes nobody bothers to check in dev-requirements.txt even when checking in requirements.txt, but the basic dynamic is similar.

The worst procedure is probably "when someone remembers to". This is not usually anyone's top priority, and most developers are busy with their regular day-to-day task. When an upgrade is necessary for some reason -- for example, a bug fix is available, this can mean a lot of disruption. Often this disruption manifests in that just upgrading one library does not work. It now depends on newer libraries, so the entire dependency graph has to be updated, all at once. All intermediate "deprecation warnings" that might have been there for several months have been skipped over, and developers are suddenly faced with several breaking upgrades, all at once. The size of the change only grows with time, and becomes less and less surmountable, making it less and less likely that it will be done, until it ends in a case of complete bitrot.

Sadly, however, "when someone remembers to" is the default procedure in the absence of any explicit procedure.

Some organizations, having suffered through the disadvantages of "when someone remembers to", decide to go to the other extreme: avoiding to check in the requirements.txt completely, and generating it on every artifact build. However, this means causing a lot of unnecessary churn. It is impossible to fix a small bug without making sure that the code is compatible with the latest versions of all libraries.

A better way to approach the problem is to have an explicit process of recalculating the expressed dependencies from the intent dependencies. One approach is to manufacture, with some cadence, code change requests that update the requirements.txt. This means they are resolved like all code changes: review, running automated tests, and whatever other local processes are implemented.

Another is to do those on a calendar based event. This can be anything from a manually-strongly-encouraged "update Monday", where on Monday morning, one of a developer tasks is to generate a requirements.txt updates for all projects they are responsible for, to including it as part of a time-based release process: for example, generating it on a cadence that aligns with agile "sprints", as part of the release of the code changes in a particular sprints.

When updating does reveal an incompatibility it needs to be resolved. One way is to update the local code: this certainly is the best thing to do when the problem is that the library changed an API or changed an internal implementation detail that was being used accidentally (...or intentionally). However, sometimes the new version has a bug in it that needs to be fixed. In that case, the intent is now to avoid that version. It is best to express the intent exactly as that: !=<bad version>. This means when an even newer version is released, hopefully fixing the bug, it will be used. If a new version is released without the bug fix, we add another != clause. This is painful, and intentionally so. Either we need to get the bug fixed in the library, stop using the library, or fork it. Since we are falling further and further behind the latest version, this is introducing risk into our code, and the increasing != clauses will indicate this pain: and encourage us to resolve it.

The most important thing is to choose a specific process for updating the expressed dependencies, clearly document it and consistently follow it. As long as such a process is chosen, documented and followed, it is possible to avoid the bitrot issue.


Tests Should Fail

Thu 02 August 2018 by Moshe Zadka

(Thanks to Avy Faingezicht and Donald Stufft for giving me encouragement and feedback. All mistakes that remain are mine.)

"eyes have they, but they see not" -- Psalms, 135:16

Eyes are expensive to maintain. They require protection from the elements, constant lubrication, behavioral adaptations to protect them and more. However …

read more

Thank you, Guido

Thu 02 August 2018 by Moshe Zadka

When I was in my early 20s, I was OK at programming, but I definitely didn't like it. Then, one evening, I read the Python tutorial. That evening changed my mind. I woke up the next morning, like Neo in the matrix, and knew Python.

I was doing statistics at …

read more

Composition-oriented programming

Sun 01 July 2018 by Moshe Zadka

A common way to expose an API in Python is as inheritance. Though many projects do that, there is a better way.

But first, let's see. How popular is inheritance-as-an-API, anyway?

Let's go to the Twisted website. Right at the center of the screen, at prime real-estate, we see:

What's …

read more

Avoiding Private Methods

Fri 01 June 2018 by Moshe Zadka

Assume MyClass._dangerous(self) is a private method. We could have implemented the same functionality without a private method as follows:

  • Define a class InnerClass with the same __init__ as MyClass
  • Define InnerClass.dangerous(self) with the same logic of MyClass._dangerous
  • Make MyClass into a wrapper class over InnerClass …
read more

PyCon US 2018 Twisted Birds of Feather Open Space Summary

Wed 16 May 2018 by Moshe Zadka

We would like Twisted to support contextvars -- this would allow cross-async libraries, like eliot to do fancy things.

Klein is almost ready to be used as-is. Glyph has the good branch which adds

  • CSRF protection
  • Forms
  • Sessions
  • Authentication

But it is too big, and we need to break it to …

read more

PyCon 2018 US Docker Birds of Feather Open Space Summary

Tue 15 May 2018 by Moshe Zadka

We started out the conversation with talking about writing good Dockerfiles. There is no list of "best practices" yet. Hynek reiterated for us "ship applications, not build environments". Moshe summarized it as "don't put gcc in the deployed image."

We discussed a little bit what we are trying to achieve …

read more

Wheels

Wed 02 May 2018 by Moshe Zadka

Announcment: My book, from python import better, has been published. This post is based on one of the chapters from it.

When Python started out, one of the oft-touted benefits was "batteries included!". Gone were the days of searching for which XML parsing library was the best -- just use the …

read more

Web Development for the 21st Century

Mon 02 April 2018 by Moshe Zadka

(Thanks to Glyph Lefkowitz for some of the inspiration for this port, and to Mahmoud Hashemi for helpful comments and suggestions. All mistakes and issues that remain are mine alone.)

The Python REPL has always been touted as one of Python's greatest strengths. With Jupyter, Jupyter Lab in its latest …

read more

Running Modules

Mon 19 March 2018 by Moshe Zadka

(Thanks to Paul Ganssle for his suggestions and improvements. All mistakes that remain are mine.)

When exposing a Python program as a command-line application, there are several ways to get the Python code to run. The oldest way, and the one people usually learn in tutorials, is to run python …

read more