Bash is Unmaintainable Python

Mon 17 July 2017 by Moshe Zadka

(Thanks to Aahz, Roy Williams, Yarko Tymciurak, and Naomi Ceder for feedback. Any mistakes that remain are mine alone.)

In the post about building Docker applications, I had the following Python script:

import datetime, subprocess
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    subprocess.check_call(["docker", "pull", orig])
    subprocess.check_call(["docker", "tag", orig, image])
    subprocess.check_call(["docker", "push", image])

I showed this script to two audiences, in two versions of the talk. One, a Python beginner audience, mostly new to Docker. Another, a Docker-centric audience, with varying levels of familiarity with Python. I gave excuses for why this script is in Python, rather than the obvious choice of shell scripting for automating command-line utilities.

None of the excuses were the true reason.

Note that in a talk, things are simplified. Typical scripts in the real world would not be 10 lines or so. They start out 10 lines, of course, but then have to account for edge cases, extra use cases, random bugs in the services that need to be worked around, and so on. I am more used to writing scripts for production than writing scripts for talks.

The true reason the script is in Python is that I have started doing all my "shell" scripting in Python recently, and I am never going back. Unix shell scripting is pretty much writing in unmaintainable Python. Before making the case for that, I am going to take a step in the other direction. The script above took care to only use the standard library. If it could take advantage of third party libraries, I would have written it this way:

import datetime, subprocess
import seashore
xctr = seashore.Executor(seashore.Shell())
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    xctr.docker.pull(orig)
    xctr.docker.tag(orig, image)
    xctr.docker.push(image)

But what if I went the other way?

import datetime, subprocess
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez/python36{}:{}".format(ext, tag)
    orig = "python:3.6{}".format(ext)
    subprocess.check_call("docker pull " + orig, shell=True)
    subprocess.check_call("docker tag " + orig + " " + image, shell=True)
    subprocess.check_call("docker push " + image, shell=True)

Note that using shell=True is discouraged, and is generally a bad idea. We will revisit why later. If I were using Python 3.6, I could even have the last three lines be:

subprocess.check_call(f"docker pull {orig}", shell=True)
subprocess.check_call(f"docker tag {orig} {image}", shell=True)
subprocess.check_call(f"docker push {image}", shell=True)

or I could even combine them:

subprocess.check_call(f"docker pull {orig} && "
                      f"docker tag {orig} {image} && "
                      f"docker push {image}", shell=True)

What about calculating the tag?

tag = subprocess.check_output("date --utc --rfc-3339=ns | "
                              "sed -e 's/ /T/' -e 's/:/-/g' "
                                  "-e 's/\./-/g' -e 's/\+.*//'",
                              shell=True)

Putting it all together, we would have

import subprocess
tag = subprocess.check_output("date --utc --rfc-3339=ns | "
                              "sed -e 's/ /T/' -e 's/:/-/g' "
                                  "-e 's/\./-/g' -e 's/\+.*//'",
                              shell=True)
for ext in ['', '-slim']:
    image = f"moshez/python36{ext}:{tag}"
    orig = f"python:3.6{ext}"
    subprocess.check_call(f"docker pull {orig} && "
                          f"docker tag {orig} {image} && "
                          f"docker push {image}", shell=True)

None of the changes we made were strictly improvements. They mostly made the code harder to read and more fragile. But now that we have done them, it is straightforward to convert it to a shell script:

#!/bin/sh
set -e
tag = $(date --utc --rfc-3339=ns |
        sed -e 's/ /T/' -e 's/:/-/g' \
        -e 's/\./-/g' -e 's/\+.*//')
for ext in '' '-slim'
do
    image = "moshez/python36$ext:$tag"
    orig = "python:3.6$ext
    docker pull $orig
    docker tag $orig $image
    docker push $image
done

Making our script worse and worse makes a Python script into a shell script. Not just a shell script -- this is arguably idiomatic shell. It uses -e, long options for legibility, and so on. Note that the shell does not even have a way to express a notion like shell=False. In a script without arguments, like this one, this merely means changes are dangerous. In a script with arguments, it means that input handling safely is difficult (and unlikely to happen). Indeed, this is why shell=False is the default, and recommended, approach in Python.

In this case, one that does little but automate unix commands, the primary use-case of shell scripts. It stands to reason that the reverse process -- making a shell script into Python, would have the reverse effect: making for more maintainable, less fragile code.

As an exercise of "going the other way", we will start with a simplified version of shell script

set -e

if [ $# != 3 ]; then
    echo "Invalid arguments: $*";
    exit 1;
fi;

PR_NUMBER="$1"; shift;
TICKET_NUMBER="$1"; shift;
BRANCH_NAME="$1"; shift;


repo="git@github.com:twisted/twisted.git";
wc="$(dirname "$(dirname "$0")")/.git";

if [ ! -d "${wc}" ]; then
  wc="$(mktemp -d -t twisted.XXXX)";

  git clone --depth 1 --progress "${repo}" "${wc}";

  cloned=true;
else
  cloned=false;
fi;

cd "${wc}";

git fetch origin "refs/pull/${PR_NUMBER}/head";
git push origin "FETCH_HEAD:refs/heads/${TICKET_NUMBER}-${BRANCH_NAME}";

if ${cloned}; then
  rm -fr "${wc}";
fi;

How would it look like, with Python and seashore?

import os
import shutil
import sys

import seashore

if len(sys.argv) != 4:
    sys.exit("Invalid arguments: " + ' '.join(sys.argv))

PR_NUMBER, TICKET_NUMBER, BRANCH_NAME = sys.argv[1:]

xctr = seashore.Executor(seashore.Shell())
repo="git@github.com:twisted/twisted.git";
wc=os.path.dirname(os.path.dirname(sys.argv[0])) + '/.git'
if not os.path.isdir(wc):
    wc = tempfile.mkdtemp(prefix='twisted')
    xctr.git.clone(repo, wc, depth=1, progress=None)
    cloned = True
else:
    cloned = False

xctr = xctr.chdir(wc)
xctr.git.fetch(origin, f"refs/pull/{PR_NUMBER}/head")
xctr.git.push(origin,
              f"FETCH_HEAD:refs/heads/{TICKET_NUMBER}-{BRANCH_NAME}")
if cloned:
    shutil.rmtree(wc)

The code is no longer, more explicit, and -- had we wanted to -- easier to now refactor into unit-testable functions.

If this is, indeed, the general case, we can skip that stage entirely: write the script in Python to begin with. When it inevitably increases in scope, it will already be in a language that supports modules and unit tests.