Why No Dry Run?

Sat 06 October 2018 by Moshe Zadka

(Thanks to Brian for his feedback. All mistakes and omissions that remain are mine.)

Some commands have a --dry-run option, which simulates running the command but without taking effect. Sometimes the option exists for speed reasons: just pretending to do something is faster than doing it. However, more often this is because doing it can cause big, possibly detrimental, effects, and it is nice to be able to see what would happen before running the script.

For example, ansible-playbook has the --check option, which will not actually have any effect: it will just report what ansible would have done. This is useful when editing a playbook or changing the configuration.

However, this is the worst possible default. If we have already decided that our command can cause much harm, and one way to mitigate the harm is to run it in a "dry run" mode and have a human check that this makes sense, why is "cause damage" the default?

As someone in SRE/DevOps jobs, many of the utilities I run can cause great harm without care. They are built to destroy whole environments in one go, or to upgrade several services, or to clean out unneeded data. Running it against the wrong database, or against the wrong environment, can wreak all kinds of havoc: from disabling a whole team for a day to actual financial harm to the company.

For this reason, the default of every tool I write is to run in dry run mode, and when wanting to actually have effect, explicitly specify --no-dry-run. This means that my finger accidentally slipping on the enter key just causes something to appear on my screen. After I am satisfied with the command, I up-arrow and add --no-dry-run to the end.

I now do it as a matter of course, even for cases where the stakes are lower. For example, the utility that publishes this blog has a --no-dry-run that publishes the blog. When run without arguments, it renders the blog locally so I can check it for errors.

So I really have no excuses... When I write a tool for serious production system, I always implement a --no-dry-run option, and have dry runs by default. What about you?