PyCon 2018 US Docker Birds of Feather Open Space Summary
Tue 15 May 2018 by Moshe ZadkaWe started out the conversation with talking about writing good Dockerfiles. There is no list of "best practices" yet. Hynek reiterated for us "ship applications, not build environments". Moshe summarized it as "don't put gcc in the deployed image."
We discussed a little bit what we are trying to achieve with better docker files. Shared base? Reproducible builds?
We talked about some of the challenges for building Docker on CI systems, especially from inside containers.
Docker on air-gapped machines is hard. So many parts assume free access to the internet.
We went on to discuss how to use multistage Dockerfiles. One important bit is what "installable artifact" to move. Some suggested wheels. Moshe suggested Pex. Hynek suggested copying a virtual environment, and Moshe showed an example
There was some discussion on making small images. The consensus was that Alpine is usually part of the answer.
There was a lot of discussion on the trade-offs between updating too soon, and too late. Some of the techniques to control update times were mentioned:
- Building everything from source
- Hashing various inputs into the image tag
- Using Red Hat Satellite
We talked about GPU containers, for machine learning. Apparently nvidia-docker is still nascent but works.
We talked about how to keep your registry clean. Unfortunately, the consensus is that you will need to build your own tooling.
We discussed what registries people use.
- GitLab Docker Registry got mixed reviews.
- ECR was a popular option.
- Nexus was mentioned.
We touched lightly on performance. Docker can use either overlayfs vs devicemapper. It's complicated
Would you run your DB in Docker? Docker is just a packaging format. You can run Postgres in Docker just fine, and mount in the data directory. However, usually people are asking about using Orchestration Frameworks for that.
StatefulSets in K8s are sometimes useful for databases.
If you are running your dev DB in Docker, data is not important. In that case, consider using eatmydata to improve performance.
We all agreed you should never use the system Python for your applications. Then how do you get Python in your Docker image?
- Use the
python:<something>
images on Docker Hub - Compile it yourself
- Use PyEnv
- Use the deadsnakes PPA on Ubuntu
Finally, we discussed the ultimate heresy: running more than one process inside your container. Or is it? Moshe mentioned that anyone running uwsgi or gunicorn is already running a process manager: just one that happens to be part of the WSGI "binary". We mentioned supervisor and NColony for explicit process management.