Abstaction Cascade

Tue 14 November 2017 by Moshe Zadka

(This is an adaptation of part of the talk Kurt Rose and I gave at PyBay 2017)

An abstraction cascade is a common anti-pattern in legacy system. It is useful to understand how to recognize it, how it tends to come about, how to fix it -- and most importantly, what kind of things will not fix it. The last one is important, in general, for anti-patterns in legacy systems: if the obvious fix worked, it would have been already dealt with, and would not be a common anti-pattern in legacy systems.

Recognition

The usual pattern for a abstraction cascade looks like complicated, ad-hoc, if/else sequence to decide which path to take. Here is example for a abstraction cascade for finding out a network address corresponding to a name:

def get_address(name):
    if name in services:
        if services[name].ip:
            return service[name].ip, service[name].port
        elif services[name].address:
            # Added for issue #2321
            if ':' in services[name].address:
               return service[name].address.split(':')
            else:
               # Fixes issues #6985
               # TODO: Hotfix, clean-up later
               return service[name].address, DEFAULT_PORT
    return dns_lookup(name), DEFAULT_PORT

History

At each step, it seems reasonable to make a specific change. Here is a typical way this kind of code comes about.

The initial version is reasonable: since DNS is a way to publish name to address mapping, why not use a standard?

def get_address(name):
    return dns_lookup(name), DEFAULT_PORT

Under load, an outage happened. There was no time to investigate how to configure DNS caching or TTL better -- so the "popular" services got added to a static list, with a "fast path" checking. This decision also makes sense: when an outage is ongoing, the top priority is to relieve the symptoms.

def get_address(name):
    if name in services:
        # Fixes issues #6985
        # TODO: Hotfix, clean-up later
        return service[name].address, DEFAULT_PORT
    return dns_lookup(name), DEFAULT_PORT

However, now the door has opened to add another path in the function. When the need to support multiple services on one host happened, it was easier to just add another path: after all, this was only for new services.

def get_address(name):
    if name in services:
        # Added for issue #2321
        if ':' in services[name].address:
            return service[name].address.split(':')
        else:
            # Fixes issues #6985
            # TODO: Hotfix, clean-up later
            return service[name].address, DEFAULT_PORT
    return dns_lookup(name), DEFAULT_PORT

When the change to IPv6 occured, splitting on : was not a safe operation -- so a separate field was added. Again, the existing "new" services (by now, many -- and not so new!) did not need to be touched:

def get_address(name):
    if name in services:
        if services[name].ip:
            return service[name].ip, service[name].port
        elif services[name].address:
            # Added for issue #2321
            if ':' in services[name].address:
               return service[name].address.split(':')
            else:
               # Fixes issues #6985
               # TODO: Hotfix, clean-up later
               return service[name].address, DEFAULT_PORT
    return dns_lookup(name), DEFAULT_PORT

Of course, this is typically just chapter one in the real story: having to adapt to multiple data centers, or multiple providers of services, will lead to more and more of these paths -- with nothing thrown away, because "some legacy service depends on it -- maybe".

Non-fixes

Fancier dispatch

Sometimes the ad-hoc if/else pattern is obscured by more abstract dispatch logic: for example, something that loops through classes and finds out which one is the right one:

class AbstractNameFinder(object):
    def matches(self, name):
        raise NotImplementedError()
    def get_address(self, name):
        raise NotImplementedError()
class DNS(AbstractNameFinder):
    def matches(self, name):
        return True
    def get_address(self, name):
        return dns_lookup(name), DEFAULT_PORT
class Local(AbstractNameFinder):
    def matches(self, name):
        return hasattr(services.get(name), 'ip')
    def get_address(self, name):
        return services[name].ip, services[name].port
finders = [Local(), DNS()]
def get_address(name):
    for finder in finders:
        if finder.match(name):
            return finder.get_address(name)

This is actually worse -- now the problem can be spread over multiple files, with no single place to fix it. While the code can be converted to this form, semi-mechanically, this does not fix the underlying issue -- and will actually make the problem continue on with force.

Pareto fix

The Pareto rule is that 80% of the problem is solved with 20% of the effort. It is often the case that a big percentage (in the stereotypical Pareto case, 80%) of the problem is not hard to fix.

For example, most services are actually listed in some file, and all we need to do is read this file in and look up based on that. The incentive to fix "80% of the problem" and leave the "20%" for later is strong.

However, usually the problem is that each of those "Pareto fixes" again makes the problem worse: since it is not a complete replacement, another dispatch layer needs to be built to support the "legacy solution". The new dispatch layer, the new solution, and the legacy solution all become part of the newest iteration of the legacy system, and cause the problem to be even worse.

Fixing 80% of the problem is useful for prototyping, since we are not sure we are solving the right problem and nothing better exists. However, in this case, the complete solution is necessary, so neither of these conditions hold.

Escape strategy

The reason this happens is because no single case can be removed. The way forward is not to add more cases, but to try and remove a single case. The first question to ask is: why was no case removed? Often, the reason is that there is no way to test whether removal is safe.

It might take some work to build infrastructure that will properly make removal safe. Unit tests are often not enough. Integration tests, as well, are sometimes not enough. Sometimes canary systems, sometimes feature flag systems, or, if worst comes to worst, a way to test and roll-back quickly if a problem is found.

Once it is possible to remove just one case (in our example above, maybe check what it would take to remove the case where we split on a colon, since this is clearly worse than just having separate attributes), thought needs to be given to which case is best.

Sometimes, there is more than one case that is really needed: some inherent, deep, trade-off. However, it is rare to need more than two, and almost unheard of to need more than three. Start removing unneeded cases one by one.

Conclusion

When seeing an abstraction cascade, there is a temptation to "clean it up": but most obvious clean-ups end up making it worse. However, by understanding how it came to be, and finding a way to remove cases, it is possible to do away with it.


Gather

Mon 13 November 2017 by Moshe Zadka

Gather is a plugin framework -- and it now has its own blog.

Use it! If you like it, tell us about it, and if there is a problem, tell us about that.

read more

Brute Forcing AES

Wed 27 September 2017 by Moshe Zadka

Thanks to Paul Kehrer for reviewing! Any mistakes or oversights that are left are my responsibility.

AES's maximum key size is 256 bits (there are also 128 and 192 bit versions available). Is that enough? Well, if there is a cryptographic flaw in AES (i.e., a way to recover …

read more

Announcing NColony 17.9.0

Tue 19 September 2017 by Moshe Zadka

I have released NColony 17.9.0, available in a PyPI near you.

New this version:

  • CalVer
  • Python 3 support!
  • You can ask to, explicitly, inherit environment variables from the monitoring process.
  • Website

Thanks to Mark Williams for reviewing many pull requests.

read more

SSH to EC2

Wed 30 August 2017 by Moshe Zadka

(Thanks to Donald Stufft for reviewing this post, and to Glyph Lefkowitz for inspiring much of it.)

(JP Calderone wrote a Twisted version of this approach.)

It is often the case that after creating an EC2 instance in AWS, the next step is SSHing. This might be because the machine …

read more

Image Editing with Jupyter

Tue 25 July 2017 by Moshe Zadka

With the news about MS Paint going away from the default MS install, it might be timely to look at other ways to edit images. The most common edit I need to do is to crop images -- and this is what we will use as an example.

My favorite image …

read more

Anatomy of a Multi-Stage Docker Build

Wed 19 July 2017 by Moshe Zadka

Docker, in recent versions, has introduced multi-stage build. This allows separating the build environment from the runtime envrionment much more easily than before.

In order to demonstrate this, we will write a minimal Flask app and run it with Twisted using its WSGI support.

The Flask application itself is the …

read more

Bash is Unmaintainable Python

Mon 17 July 2017 by Moshe Zadka

(Thanks to Aahz, Roy Williams, Yarko Tymciurak, and Naomi Ceder for feedback. Any mistakes that remain are mine alone.)

In the post about building Docker applications, I had the following Python script:

import datetime, subprocess
tag = datetime.datetime.utcnow().isoformat()
tag = tag.replace(':', '-').replace('.', '-')
for ext in ['', '-slim']:
    image = "moshez …
read more

Imports at a Distance

Sun 25 June 2017 by Moshe Zadka

(Thanks to Mark Williams for feedback and research)

Imagine the following code:

## mymodule.py
import toplevel.nextlevel.lowmodule

def _func():
    toplevel.nextlevel.lowmodule.dosomething(1)

def main():
    _func()

Assuming the toplevel.nextlevel.module does define a function dosomething, this code seems to work just fine.

However, imagine that later we …

read more

X Why Zip

Sat 24 June 2017 by Moshe Zadka

PEP 441 resulted in the creation of the zipapp module. The PEP says "Python has had the ability to execute directories or ZIP-format archives as scripts since version 2.6 [...] This feature is not as popular as it should be mainly because it was not promoted as part of Python …

read more