DRY, or Don't Repeat Yourself is frequently touted as a principle of software development. "Copy-pasta" is the derisive term applied to a violation of it, tying together the concept of copying code and pasta as description of software development bad practices (see also spaghetti code).
It is so uniformly reviled that some people call DRY a "principle" that you should never violate. Indeed, some linters even detect copy-paste so that it can never sneak into the code. But copy-paste is not a comic-book villain, and DRY does not come bedecked in primary colors to defeat it.
It is worthwhile to know why DRY started out as a principle. In particular, some for some modern software development practices, violating DRY is the right thing to do.
The main problem with repeating a code chunk is that if a bug is found, there is more than one place where it needs to be fixed. On the surface of it, this seems like a reasonable criticism. All code has bugs, those bugs will be fixed, why not minimize the cost of fixing them?
As with all engineering decisions, following DRY is a trade-off. DRY leads to the following issues:
- Loss of locality
- Overgeneralized code
- Coordination issues
- Ownership issues
Loss of locality
The alternative to copy-pasting the code is usually to put it in a function (or procedure, or a subroutine, depending on the language), and call it. This means that when reading through the original caller, it is less clear what the code does.
When you are debugging, this means we need to "Step into" the function. While stepping into, it is non-trivial to check the original variables. If you are doing "print debugging", this means finding the original source for the function and adding relevant print statements there.
Especially when DRY is pointed out and reactions are instinctive, the function might have some surprising semantics. For example, mutating contents of local variables is sensible in code. When you move this code to a function as a part of a straightforward DRY refactoring, this means that now a function is mutating its parameters.
Even if the code initially was the same in both places, there is no a-priori guarantee that it will stay this way. For example, one of those places might be called frequently, and so would like to avoid logging too many details. The other place is called seldom, and those details are essential to trouble-shooting frequent problems.
The function that was refactored now has to support an extra parameter: whether to log those details or not. (This parameter might be a boolean, a logging level, or even a logging "object" that has correct levels set up.)
Since usually there is no institutional memory to undo the DRY refactoring, the function might add more and more cases, eventually almost being two functions in one. If the "copy-pasta" was more extensive, it might lead to extensive over-generalization: each place needs a slightly different variation of the functionality.
Each modification of the "common" function now requires testing all of its callers. In some situations, this can be subtly non-trivial.
For example, if the repetition was across different repositories, now updates means updating library versions. The person making the change might not even be aware of all the callers. The callers only find out when a new library version is used in their code.
When each of those code segments were repeated, ownership and responsibility were trivial. Whoever owned the surrounding code also owned the repeated segment.
Now that the code has been moved elsewhere, to a "shared" location, ownership can often be muddled. When a bug is found, who is supposed to fix it? What happens if that "bug" is already relied on by another use?
Especially in case with reactive DRY refactoring, there is little effort given to specifying the expected semantics of the common code. There might be some tests, but the behavior that is not captured by tests might still vary.
Having a common library which different code bases can be relied on is good. However, adding functions to such a library or libraries should be done mindfully. A reviewer comment about "this code duplicates the functionality already implemented here" or, even worse, something like pylint code duplication detector, does not have that context or mindfulness.
It is better to acknowledge the duplication, perhaps track it via a ticket, and let the actual "DRY" application take place later. This allows gathering more examples, thinking carefully about API design, and make sure that ownership and backwards compatibility issues have been thought of.
Deduplicating code by putting common lines into functions, without careful thought about abstractions, is never a good idea. Understanding how to abstract correctly is essentially API design. API design is subtle, and difficult to do well. There are no easy short-cuts, and developing expertise in it takes a long time.
Because API design is such a complex skill, it is not easy to give general guidelines except one: wait. Rushing into an API design does not make a good API, even if the person rushing is an expert.
Fifty Shades of Ver
Computers work on binary code. If statements take one path: true, or false. For computers, bright lines and clear borders make sense.
Humans are more complicated. What's an adult? When are you happy? How mature are you? Humans have fuzzy feelings with no clear delination.
I was more responsible as …read more
I have written before about my Inbox Zero methodology. This is still what I practice, but there is a lot more that helps me.
The concept behind "Universal Binary" is that the only numbers that make sense asymptotically are zero, one, and infinity. Therefore, in order to prevent things from …read more
The Hardest Logic Puzzle Ever (In Python)
Hey, Back Off!
The choice in parameters for back-off configuration is important. It can be the difference between a barely noticable blip in service quality and an hours-long site outage. In order to explore the consequences of the choice, I wrote a little fictional ditty about a fictional website.
I hope you enjoy …read more
A Labyrinth of Lies
In the 1986 movie Labyrinth, a young girl (played by Jennifer Connelly) is faced with a dilemma. The adorable Jim Henson puppets explain to her that one guard always lies, and one guard always tells the truth. She needs to figure out which door leads to the castle at the …read more
Conditionally Logging Expensive Tasks
(I have shown this technique in my mailing list. If this kind of thing seems interesting, why not subscribe?)
Imagine you want to log something that is,
expensive to calculate.
in DEBUG mode,
you would like to count the classes of the objects in
My Little Pony -- DevOps is Magic
(This article is based on the one I originally published on OpenSource.com.)
In 2010, the My Little Pony franchise was rebooted with the animated show My Little Pony: Friendship is Magic. The combination of accessibility to children with the sophisticated themes the show tackled garnered a following that cut …read more
Numbers in Python
Numbers in Python come in all shapes and forms. The reason different kind of representations of numbers exist is because they all have different trade-offs. These trade-offs are often surprising!
The most surprising things about integers is how easily they stop being
Dividing two integers, for example,
Goodbye, John H. Conway
John H. Conway passed away ten days ago, and I think it's only now I can write a proper eulogy.
I was first introduced to his work, if not his name, when I was at the end of elementary school. I am sure everyone has heard about the Game of …read more