Bash is Unmaintainable Python
Mon 17 July 2017 by Moshe Zadka(Thanks to Aahz, Roy Williams, Yarko Tymciurak, and Naomi Ceder for feedback. Any mistakes that remain are mine alone.)
In the post about building Docker applications, I had the following Python script:
import datetime, subprocess tag = datetime.datetime.utcnow().isoformat() tag = tag.replace(':', '-').replace('.', '-') for ext in ['', '-slim']: image = "moshez/python36{}:{}".format(ext, tag) orig = "python:3.6{}".format(ext) subprocess.check_call(["docker", "pull", orig]) subprocess.check_call(["docker", "tag", orig, image]) subprocess.check_call(["docker", "push", image])
I showed this script to two audiences, in two versions of the talk. One, a Python beginner audience, mostly new to Docker. Another, a Docker-centric audience, with varying levels of familiarity with Python. I gave excuses for why this script is in Python, rather than the obvious choice of shell scripting for automating command-line utilities.
None of the excuses were the true reason.
Note that in a talk, things are simplified. Typical scripts in the real world would not be 10 lines or so. They start out 10 lines, of course, but then have to account for edge cases, extra use cases, random bugs in the services that need to be worked around, and so on. I am more used to writing scripts for production than writing scripts for talks.
The true reason the script is in Python is that I have started doing all my "shell" scripting in Python recently, and I am never going back. Unix shell scripting is pretty much writing in unmaintainable Python. Before making the case for that, I am going to take a step in the other direction. The script above took care to only use the standard library. If it could take advantage of third party libraries, I would have written it this way:
import datetime, subprocess import seashore xctr = seashore.Executor(seashore.Shell()) tag = datetime.datetime.utcnow().isoformat() tag = tag.replace(':', '-').replace('.', '-') for ext in ['', '-slim']: image = "moshez/python36{}:{}".format(ext, tag) orig = "python:3.6{}".format(ext) xctr.docker.pull(orig) xctr.docker.tag(orig, image) xctr.docker.push(image)
But what if I went the other way?
import datetime, subprocess tag = datetime.datetime.utcnow().isoformat() tag = tag.replace(':', '-').replace('.', '-') for ext in ['', '-slim']: image = "moshez/python36{}:{}".format(ext, tag) orig = "python:3.6{}".format(ext) subprocess.check_call("docker pull " + orig, shell=True) subprocess.check_call("docker tag " + orig + " " + image, shell=True) subprocess.check_call("docker push " + image, shell=True)
Note that using shell=True
is discouraged,
and is generally a bad idea.
We will revisit why later.
If I were using Python 3.6, I could even have the last three lines be:
subprocess.check_call(f"docker pull {orig}", shell=True) subprocess.check_call(f"docker tag {orig} {image}", shell=True) subprocess.check_call(f"docker push {image}", shell=True)
or I could even combine them:
subprocess.check_call(f"docker pull {orig} && " f"docker tag {orig} {image} && " f"docker push {image}", shell=True)
What about calculating the tag?
tag = subprocess.check_output("date --utc --rfc-3339=ns | " "sed -e 's/ /T/' -e 's/:/-/g' " "-e 's/\./-/g' -e 's/\+.*//'", shell=True)
Putting it all together, we would have
import subprocess tag = subprocess.check_output("date --utc --rfc-3339=ns | " "sed -e 's/ /T/' -e 's/:/-/g' " "-e 's/\./-/g' -e 's/\+.*//'", shell=True) for ext in ['', '-slim']: image = f"moshez/python36{ext}:{tag}" orig = f"python:3.6{ext}" subprocess.check_call(f"docker pull {orig} && " f"docker tag {orig} {image} && " f"docker push {image}", shell=True)
None of the changes we made were strictly improvements. They mostly made the code harder to read and more fragile. But now that we have done them, it is straightforward to convert it to a shell script:
#!/bin/sh set -e tag = $(date --utc --rfc-3339=ns | sed -e 's/ /T/' -e 's/:/-/g' \ -e 's/\./-/g' -e 's/\+.*//') for ext in '' '-slim' do image = "moshez/python36$ext:$tag" orig = "python:3.6$ext docker pull $orig docker tag $orig $image docker push $image done
Making our script worse and worse makes a Python script
into a shell script.
Not just a shell script --
this is arguably idiomatic shell.
It uses -e
, long options for legibility,
and so on.
Note that the shell does not even have a way to express
a notion like shell=False
.
In a script without arguments, like this one,
this merely means changes are dangerous.
In a script with arguments,
it means that input handling safely is difficult
(and unlikely to happen).
Indeed, this is why shell=False
is the default,
and recommended,
approach in Python.
In this case, one that does little but automate unix commands, the primary use-case of shell scripts. It stands to reason that the reverse process -- making a shell script into Python, would have the reverse effect: making for more maintainable, less fragile code.
As an exercise of "going the other way", we will start with a simplified version of shell script
set -e if [ $# != 3 ]; then echo "Invalid arguments: $*"; exit 1; fi; PR_NUMBER="$1"; shift; TICKET_NUMBER="$1"; shift; BRANCH_NAME="$1"; shift; repo="git@github.com:twisted/twisted.git"; wc="$(dirname "$(dirname "$0")")/.git"; if [ ! -d "${wc}" ]; then wc="$(mktemp -d -t twisted.XXXX)"; git clone --depth 1 --progress "${repo}" "${wc}"; cloned=true; else cloned=false; fi; cd "${wc}"; git fetch origin "refs/pull/${PR_NUMBER}/head"; git push origin "FETCH_HEAD:refs/heads/${TICKET_NUMBER}-${BRANCH_NAME}"; if ${cloned}; then rm -fr "${wc}"; fi;
How would it look like, with Python and seashore
?
import os import shutil import sys import seashore if len(sys.argv) != 4: sys.exit("Invalid arguments: " + ' '.join(sys.argv)) PR_NUMBER, TICKET_NUMBER, BRANCH_NAME = sys.argv[1:] xctr = seashore.Executor(seashore.Shell()) repo="git@github.com:twisted/twisted.git"; wc=os.path.dirname(os.path.dirname(sys.argv[0])) + '/.git' if not os.path.isdir(wc): wc = tempfile.mkdtemp(prefix='twisted') xctr.git.clone(repo, wc, depth=1, progress=None) cloned = True else: cloned = False xctr = xctr.chdir(wc) xctr.git.fetch(origin, f"refs/pull/{PR_NUMBER}/head") xctr.git.push(origin, f"FETCH_HEAD:refs/heads/{TICKET_NUMBER}-{BRANCH_NAME}") if cloned: shutil.rmtree(wc)
The code is no longer, more explicit, and -- had we wanted to -- easier to now refactor into unit-testable functions.
If this is, indeed, the general case, we can skip that stage entirely: write the script in Python to begin with. When it inevitably increases in scope, it will already be in a language that supports modules and unit tests.