Docker in Production – Kevin Schmidt

By now we all know and have worked with Docker a bit. It’s perfect for creating a consistent environment. Running several of them at the same time is easy due to the isolated networking and file system. If you have a set of proper environments (Integration, Staging and Production), then it gets a bit more complicated as one has to integrate Docker with existing services: logging, monitoring, backup as well as deal with deployment and rolling updates.

Logging

Logging is probably the easiest to deal, as you have a variety of options supported by Docker, notably Syslog, JSON file, Fluentd, GELF and various cloud provider services (AWS, GC, etc.). If you have an existing logging infrastructure, like the typical setup of Logstash forwarding to ElasticSearch for Kibana visualisation, you can connect Docker containers directly to Logstash using GELF – just select the correct logging driver and GELF config when starting the container and activate the Logstash plugin for GELF. Sounds easy and for simple programs, it works quite well. But what if you have separate error and access logs? Or log particular sections of the app into different log files? Docker logging by default only captures console logging so to capture multiple logs or file logging you either have to do post-processing in Logstash or mount a logging volume into your container and then pick up the logging files with a typical file forwarder like Filebeat. Which way to pick depends on the complexity of the application and your setup.

Monitoring

Starting containers is easy, keeping them running is harder. Monitoring Docker containers have become much easier: most monitoring solutions now have built-in Docker support or support it via plugins. So at least you get decent alerts when your containers are failing. Standard process monitoring also works on processes inside Docker containers – after all, they are not really separate, like in a VM, but just in a jail, easily noticeable if you just run a ps on your docker host server. Exposed ports are also just ports on the host system. Internal ports or network connection between Docker containers are not easily monitored as the network stack is isolated. In this case one would either have to a) expose a rest status call that returns the status of those ports and connections to the monitoring system or b) start up a docker container with the only purpose of monitoring – both non-trivial solutions.

Keeping Docker containers running can be done in two ways: if you have applications that occasional but regularly crash (e.g. Node), Docker supports the --restart parameter for docker run. The options are always, unless-stopped, on-failure:max-retry. I consider always and unless-stopped both dangerous – a constantly restarting container (due to persistent error) can’t fulfil its function and might not be noticed by any monitoring solution. The on-failure:max-retry is much nicer as long as the max-retry value is picked reasonably well. I currently run most containers at on-failure:5. Note that the container has to exit with an exit code different than 0 for this to be triggered.

If your container tends to fail for long periods of time, e.g. because of an external database outage, it might be more prudent to leave the restarting to your configuration management system or clustering solution.

Backups

Container themselves should not be backed up – that’s against the spirit of containers as disposable environments. But some Docker applications produce data or are databases themselves. You would then mount the respective data directory from the host and then use your trusted backup system of choice to back up from the host.

Alternatively, you can mount NFS/SMB/EFS volumes in Docker directly using the --volume-driver options and respective plugins (http://netshare.containx.io/). I haven’t yet used that in production, but it looks promising.

The third option is to run a Docker container that mounts the volume of another container and runs a backups program. I found that honestly more hassle than just mounting it in on the host, but it is an option.

Deployment

The biggest issues with Docker and supporting multiple environments (Production/Staging/Integration) is configuration and updates, let’s look at configuration first, updates second.

Configuration is naturally different between Integration and Production, different databases, different micro-services to connect to, etc. I tried out four different approaches here that all have their benefits and downsides.

First and simplest is to configure via environment variables when the container starts. That can easily be handled during deployment and works well for a limited set of config options but can get out of hand quickly if you have a service requiring lots of configuration and some frameworks need a config file and won’t be happy with just environment variables.

Second, you can have different docker containers for different environments. That’s easily done by using the layered nature of Docker containers. We start with a base container (say Scala runtime), let the developer add the compiled code and libraries to run the app and lastly DevOps adds the config file for the respective environment. That can be easily organised with naming standards: scala, scala-myapp-1.0, scala-myapp-1.0-prod-16ba001. The first is the base runtime container; the second is the app binary container build from the base container and properly versioned as 1.0. The last one is the final container build with the proper configuration for prod and the git hash of the prod configuration attached – so you know what config you have deployed. This works very well in practice, and the only downside is that you have to have a rather complicated build process to do all those steps.

The third option is to use a distributed service discovery and configuration like Consul, etcd or Zookeeper. That allows your container service to connect directly to the service and download their respective config programmatically which means that the container just needs an environment variable to know if it is in e.g. Staging. I used this setup only with Consul so far, but it works quite well and makes the release process much simpler. The downside is that you have to have a discovery service running without interruption and also make it available for developers otherwise the container won’t start during development. Also, same as in the first option, some frameworks need config files and won’t be happy with distributed variables.

Fourth and so far last option is just to mount the config directory to the host and lot your configuration management system of choice (I use SaltStack) handle config files. That works well for all framework but creates overhead for the config management system: it now needs to know what specific containers are running to deploy the config and also restart the container to pick up the new config.

Updates

You want to update your containers one by one to a new version of the docker images to avoid downtime. Again, the configuration system with built-in Docker support (again SaltStack) can handle that for you by running different docker host servers on different update schedules or roll out to half the servers manually. That can be quite cumbersome, and that’s where Docker clustering solutions like Mesos, Kubernetes or Swarm come in. In Mesos using the Marathon deployer, you can schedule a rolling update of your container with a minimal safe capacity of containers to keep running during an upgrade. That works very well in practice so far not a single hiccup during upgrades in over a year of running on Mesos. How one runs Docker on those clustering solutions in practice would be a bit much for his article so I will explain this in a follow-up.

In summary – thanks to increased Docker support in and with logging, monitoring and configuration management systems – running Docker in production is not particularly hard as long as the limitations of containers are taken into account.

Leave a Reply Cancel reply