A common way to expose an API in Python is as inheritance. Though many projects do that, there is a better way.

But first, let's see. How popular is inheritance-as-an-API, anyway?

Let's go to the Twisted website. Right at the center of the screen, at prime real-estate, we see:

What's there? The following is abridged:

class Echo(protocol.Protocol):
    def dataReceived(self, data):
        self.transport.write(data)
class EchoFactory(protocol.Factory):
    def buildProtocol(self, addr):
        return Echo()

(This is part of an example on building an echo-server protocol.)

If you are wondering who came up with this amazing API, it is the same person who is writing the words you are reading. I certainly thought it was an amazing API!

Look at how many smart people agreed with me.

Django takes a page of tutorial to get there, but sure enough:

class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')
class Choice(models.Model):
    question = models.ForeignKey(Question, on_delete=models.CASCADE)
    choice_text = models.CharField(max_length=200)
    votes = models.IntegerField(default=0)

Jupyter's echo kernel starts:

class EchoKernel(Kernel):
    implementation = 'Echo'
    implementation_version = '1.0'
    language = 'no-op'

Everyone is doing it. A project I have been a developer on for ~16 years. The most popular Python web library, responsible for who-knows-how-many requests per second in Instagram. A project that won the ACM award (and well deserved, at that).

However, popularity is not everything. This is not a good idea.

When exposing class inheritance as a public interface, that means committing to a level of backwards compatibility that is unheard of. Even adding private methods or attributes becomes dangerous.

Let's give a toy example:

class Writer:

    _write = lambda x: None

    def set_output(self, output):
        self._write = output.write

    def write(self, message):
        formatted = self.format(message)
        self._write(message)

    def format(self, message):
        raise NotImplementedError("format")

This is a simple writer, that, while initially sending everything down a black hole, can be set to write the output to a file-like object. It needs to format the messages, so the proper usage is to subclass and override format (while taking care not to define methods called set_output or _write.)

class BufferWriter(MultiWriter):

    _buffer = False

    def format(self, message):
        if self._buffer:
            return 'Buffer: ' + message
        else:
            return 'Message: ' + message

    def switch_buffer(self):
        self._buffer = not self._buffer

The simplest formatting would return the message as is. However, this formatter is slightly less trivial -- it prefixes the message with the word Buffer or Message, depending on an internal variable that can be switched.

Now we can do things like:

>>> bp = BufferWriter()
>>> bp.set_output(sys.stdout)
>>> bp.write("hello")
Message: hello
>>> bp.switch_buffer()
>>> bp.write("hello")
Buffer: hello

This looks good, so far. Of course, things are never so simple in real life. The writer library, naturally, gets thousands of stars on GitHub. It becomes popular. There's a development community, complete with a discord channel and a mailing list. So naturally, important features get added.

class Writer:

    _buffer = ""

    _write = lambda x: None

    def set_output(self, output):
        self._write = output.write

    def write(self, message):
        self._buffer += self.format(message)
        if len(self._buffer) > 10:
            self._write(self._buffer)
            self._buffer = ""

    def format(self, message):
        raise NotImplementedError("format")

Turns out people needed to buffer some of the shorter messages. This was a crucial performance improvement, that all users were clamoring for, so version 2018.6.1 is highly anticipated.

It breaks, though, the BufferWriter. The symptoms are weird: TypeError s and other such fun. All because both the superclass and the subclass are competing to access self._buffer.

With enough care, these problems can be avoided. A library which exposes classes for inheritance must add all new private methods or attributes as __ and, naturally, never ever add any public methods or attributes. Sadly, nobody does that.

So what's the alternative?

from zope import interface

class IFormatter(interface.Interface):

    def format(message):
        """show stuff"""

We define an abstract interface. This interface [1] has only one method -- format.

@attr.s
class Writer:

    _buffer = ""

    _write = lambda x: None

    _formatter = attr.ib()

    def set_output(self, output):
        self._write = output.write

    def write(self, message):
        self._buffer += self._formatter.format(message)
        if len(self._buffer) > 10:
            self._write(self._buffer)
            self._buffer = ""

We use the attrs library [#] to define our main functionality: a class that wraps other objects, which we expect to be IFormatter.

We can automatically verify, by instead having the _formatter line say:

_formatter = attr.ib(validator=lambda instance, attribute, value:
                               verify.verifyObject(IFormatter, value))

Note that this separates the concerns: the "fake method" format has moved to a "fake class" (an interface).

@interface.implementer(IFormatter)
class BufferFormatter:

    _buffer = False

    def format(self, message):
        if self._buffer:
            return 'All Channels: ' + message
        else:
            return 'Limited Channels: ' + message

    def switch_buffer(self):
        self._buffer = not self._buffer

Note that now, if we only have the Writer object, there is no way to switch prefixes. Correctly switching prefixes means keeping access to the original object.

If there is a need to "call back" to the original methods, the original object can be passed in to the wrapped object. One advantage is that, being a distinct object, it is obvious one should only call into public methods and only access public variables.

Passing ourselves to a method is, in general, not an ideal practice. What we really should do, is to pass specific methods or variables directly into the method. But this is funny: when using inheritance, we always effectively pass ourselves to every method. So even this refactoring is a net improvement. When the biggest criticism of a refactoring is "this could now be improved even more", it usually means it is a good idea.

Credits:

  • Thanks to Tom Goren for his feedback -- the original version was more aggressive.
  • Thanks to Glyph Lefkowitz for pushing me to make the example better.
  • Thanks to Augie Fackler and Nathaniel Manista for much of the inspiration.
[1]The zope.interface library is a little like the abc libary: both give tools to clarify what methods we expect. However, the abc.ABC like inheritance a little too much. Glyph has a good explanation about the advantages.
[2]attrs makes defining Python classes much less boiler-platey. There's another Glyph post explaining why it is so good.

Avoiding Private Methods

Fri 01 June 2018 by Moshe Zadka

Assume MyClass._dangerous(self) is a private method. We could have implemented the same functionality without a private method as follows:

  • Define a class InnerClass with the same __init__ as MyClass
  • Define InnerClass.dangerous(self) with the same logic of MyClass._dangerous
  • Make MyClass into a wrapper class over InnerClass …
read more

PyCon US 2018 Twisted Birds of Feather Open Space Summary

Wed 16 May 2018 by Moshe Zadka

We would like Twisted to support contextvars -- this would allow cross-async libraries, like eliot to do fancy things.

Klein is almost ready to be used as-is. Glyph has the good branch which adds

  • CSRF protection
  • Forms
  • Sessions
  • Authentication

But it is too big, and we need to break it to …

read more

PyCon 2018 US Docker Birds of Feather Open Space Summary

Tue 15 May 2018 by Moshe Zadka

We started out the conversation with talking about writing good Dockerfiles. There is no list of "best practices" yet. Hynek reiterated for us "ship applications, not build environments". Moshe summarized it as "don't put gcc in the deployed image."

We discussed a little bit what we are trying to achieve …

read more

Wheels

Wed 02 May 2018 by Moshe Zadka

Announcment: My book, from python import better, has been published. This post is based on one of the chapters from it.

When Python started out, one of the oft-touted benefits was "batteries included!". Gone were the days of searching for which XML parsing library was the best -- just use the …

read more

Web Development for the 21st Century

Mon 02 April 2018 by Moshe Zadka

(Thanks to Glyph Lefkowitz for some of the inspiration for this port, and to Mahmoud Hashemi for helpful comments and suggestions. All mistakes and issues that remain are mine alone.)

The Python REPL has always been touted as one of Python's greatest strengths. With Jupyter, Jupyter Lab in its latest …

read more

Running Modules

Mon 19 March 2018 by Moshe Zadka

(Thanks to Paul Ganssle for his suggestions and improvements. All mistakes that remain are mine.)

When exposing a Python program as a command-line application, there are several ways to get the Python code to run. The oldest way, and the one people usually learn in tutorials, is to run python …

read more

Random Bites of Pi(e)

Wed 14 March 2018 by Moshe Zadka

In today's edition of Pi day post, we will imagine we have a pie. (If you lack imagination, go out and get a pie.) (Even if you do not lack imagination, go out and get a pie.)

As is traditional, we got a round pie. Since pies are important, we …

read more

The Python Toolbox

Thu 01 February 2018 by Moshe Zadka

I have written before about Python tooling. However, as all software, things have changed -- and I wanted to write a new post, with my current understanding of best practices.

Testing

As of now, pytest has achieved official victory. Unless there are overwhelming reasons to use something else, strongly consider using …

read more

Jupyter for SRE

Sat 30 December 2017 by Moshe Zadka

Jupyter is a tool that came out of the data science community. In science, being able to replicate experiments is of the utmost importance -- so a tool where you can "show your work" is helpful. However, being able to show your work -- have colleagues validate what you have done, repeat …

read more