Composition-oriented programming

Sun 01 July 2018 by Moshe Zadka

A common way to expose an API in Python is as inheritance. Though many projects do that, there is a better way.

But first, let's see. How popular is inheritance-as-an-API, anyway?

Let's go to the Twisted website. Right at the center of the screen, at prime real-estate, we see:

What's there? The following is abridged:

class Echo(protocol.Protocol):
    def dataReceived(self, data):
        self.transport.write(data)
class EchoFactory(protocol.Factory):
    def buildProtocol(self, addr):
        return Echo()

(This is part of an example on building an echo-server protocol.)

If you are wondering who came up with this amazing API, it is the same person who is writing the words you are reading. I certainly thought it was an amazing API!

Look at how many smart people agreed with me.

Django takes a page of tutorial to get there, but sure enough:

class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')
class Choice(models.Model):
    question = models.ForeignKey(Question, on_delete=models.CASCADE)
    choice_text = models.CharField(max_length=200)
    votes = models.IntegerField(default=0)

Jupyter's echo kernel starts:

class EchoKernel(Kernel):
    implementation = 'Echo'
    implementation_version = '1.0'
    language = 'no-op'

Everyone is doing it. A project I have been a developer on for ~16 years. The most popular Python web library, responsible for who-knows-how-many requests per second in Instagram. A project that won the ACM award (and well deserved, at that).

However, popularity is not everything. This is not a good idea.

When exposing class inheritance as a public interface, that means committing to a level of backwards compatibility that is unheard of. Even adding private methods or attributes becomes dangerous.

Let's give a toy example:

class Writer:

    _write = lambda x: None

    def set_output(self, output):
        self._write = output.write

    def write(self, message):
        formatted = self.format(message)
        self._write(message)

    def format(self, message):
        raise NotImplementedError("format")

This is a simple writer, that, while initially sending everything down a black hole, can be set to write the output to a file-like object. It needs to format the messages, so the proper usage is to subclass and override format (while taking care not to define methods called set_output or _write.)

class BufferWriter(MultiWriter):

    _buffer = False

    def format(self, message):
        if self._buffer:
            return 'Buffer: ' + message
        else:
            return 'Message: ' + message

    def switch_buffer(self):
        self._buffer = not self._buffer

The simplest formatting would return the message as is. However, this formatter is slightly less trivial -- it prefixes the message with the word Buffer or Message, depending on an internal variable that can be switched.

Now we can do things like:

>>> bp = BufferWriter()
>>> bp.set_output(sys.stdout)
>>> bp.write("hello")
Message: hello
>>> bp.switch_buffer()
>>> bp.write("hello")
Buffer: hello

This looks good, so far. Of course, things are never so simple in real life. The writer library, naturally, gets thousands of stars on GitHub. It becomes popular. There's a development community, complete with a discord channel and a mailing list. So naturally, important features get added.

class Writer:

    _buffer = ""

    _write = lambda x: None

    def set_output(self, output):
        self._write = output.write

    def write(self, message):
        self._buffer += self.format(message)
        if len(self._buffer) > 10:
            self._write(self._buffer)
            self._buffer = ""

    def format(self, message):
        raise NotImplementedError("format")

Turns out people needed to buffer some of the shorter messages. This was a crucial performance improvement, that all users were clamoring for, so version 2018.6.1 is highly anticipated.

It breaks, though, the BufferWriter. The symptoms are weird: TypeError s and other such fun. All because both the superclass and the subclass are competing to access self._buffer.

With enough care, these problems can be avoided. A library which exposes classes for inheritance must add all new private methods or attributes as __ and, naturally, never ever add any public methods or attributes. Sadly, nobody does that.

So what's the alternative?

from zope import interface

class IFormatter(interface.Interface):

    def format(message):
        """show stuff"""

We define an abstract interface. This interface [1] has only one method -- format.

@attr.s
class Writer:

    _buffer = ""

    _write = lambda x: None

    _formatter = attr.ib()

    def set_output(self, output):
        self._write = output.write

    def write(self, message):
        self._buffer += self._formatter.format(message)
        if len(self._buffer) > 10:
            self._write(self._buffer)
            self._buffer = ""

We use the attrs library [#] to define our main functionality: a class that wraps other objects, which we expect to be IFormatter.

We can automatically verify, by instead having the _formatter line say:

_formatter = attr.ib(validator=lambda instance, attribute, value:
                               verify.verifyObject(IFormatter, value))

Note that this separates the concerns: the "fake method" format has moved to a "fake class" (an interface).

@interface.implementer(IFormatter)
class BufferFormatter:

    _buffer = False

    def format(self, message):
        if self._buffer:
            return 'All Channels: ' + message
        else:
            return 'Limited Channels: ' + message

    def switch_buffer(self):
        self._buffer = not self._buffer

Note that now, if we only have the Writer object, there is no way to switch prefixes. Correctly switching prefixes means keeping access to the original object.

If there is a need to "call back" to the original methods, the original object can be passed in to the wrapped object. One advantage is that, being a distinct object, it is obvious one should only call into public methods and only access public variables.

Passing ourselves to a method is, in general, not an ideal practice. What we really should do, is to pass specific methods or variables directly into the method. But this is funny: when using inheritance, we always effectively pass ourselves to every method. So even this refactoring is a net improvement. When the biggest criticism of a refactoring is "this could now be improved even more", it usually means it is a good idea.

Credits:

  • Thanks to Tom Goren for his feedback -- the original version was more aggressive.
  • Thanks to Glyph Lefkowitz for pushing me to make the example better.
  • Thanks to Augie Fackler and Nathaniel Manista for much of the inspiration.
[1]The zope.interface library is a little like the abc libary: both give tools to clarify what methods we expect. However, the abc.ABC like inheritance a little too much. Glyph has a good explanation about the advantages.
[2]attrs makes defining Python classes much less boiler-platey. There's another Glyph post explaining why it is so good.