Building With Jenkins Inside an Ephemeral Docker Container
Thinking inside the container means building inside one as well. Today I’d like to open up the box on how my team is currently combining Jenkins and Docker to serve Riot Engineering teams. In the most recent post, I promised I would soon discuss the actual build slave and Jenkins configuration directly. In some ways this is the main event—if you don’t have a Jenkins server ready to receive slaves, it’s a good idea to go back through and follow along with the previous posts.
Before I begin the tutorial proper, however, let’s talk about the approach and alternatives. There are many ways to use Docker containers as build slaves. Even narrowing the field to just using them on Jenkins still presents plenty of options. From my research and discovery, I feel that there are two major approaches you can take. Conceptually, I’ll refer to these as the “Docker execution” and “Docker ephemeral slave” models.
These two approaches have roots in how Jenkins connects and communicates to your build slave. In the execution model, Jenkins connects in a traditional fashion: by running an agent on an existing VM or physical hardware, where it expects a running Docker host. In the ephemeral model, Jenkins connects to a Docker container directly and treats that as a build slave. The distinction between the two options is important, so let’s take a moment to break each one down.
Docker Execution Model
In the execution model, we architecturally assume the slave is a Docker host but we treat it as a physical machine. When a Jenkins job starts, it syncs/creates a working directory on the slave directly, leverages “docker run” and “docker exec” commands to spin up a container, and then mounts its local workspace inside. The container is a virtual scratchpad and isolated environment. It can have all the custom versions of tools and binaries that we need to compile the source code we’ve mounted into the container.
When the build job is complete, all the binaries and build artifacts it produced will be on the slave in the traditional Jenkins workspace. Jenkins can shut down the container safely and do any post-build cleanup as normal.
This model is best represented by the Cloudbees Custom Build Environment Plugin, which is open-source and maintained by the owners of the Jenkins source repository.
Docker Ephemeral Slave Model
The ephemeral model aims to leverage the autonomous and isolated nature of Docker containers to scale a Jenkins build farm to meet any demand placed upon it. Instead of the traditional array of pre-allocated slaves or at-the-ready virtual machines, this model treats the entire container itself as a slave.
We spin up a container whenever there is a demand for a Jenkins executor, automatically configure Jenkins to accept this container as a new slave, execute the job within it, and finally shut down the container and de-allocate the slave. It’s certainly more complicated than the execution model.
This approach has several competing plugins, generally centered around how to maintain and build the Docker cloud you want to use. There’s plugins for Kubernetes, Mesos, and “pure” Docker approaches so far.
Which Model to Choose?
Both models represent valid approaches to solving the problem. At Riot, we were interested in getting away from allocating build boxes and “executors” to specific purposes, so the ephemeral model was attractive to us. We liked the idea of having a container represent the entire slave. So we went with the Docker Plugin to achieve our goals.
Over time, we’ve been happy with that choice. The main developer of the Docker Plugin (KostyaSha) has maintained active development, frequent updates, and strong communication. He’s responsive to problems and also has a clear roadmap over at his GitHub repo. The tutorial I posted with this article would not have been possible without his work and is based on release 0.16 of the plugin.
As an additional note, the “Docker Plugin” has just been forked by KostyaSha into the “Yet Another Docker Plugin” repo where development continues. I opted to focus this blog on the current Docker Plugin as that is what we use in production right now.
We may change our minds in the future about our approach. We try to stay flexible so we can pivot if a better solution comes along. Our approach did not come without serious lessons learned—I’ll describe some of them at the end of the tutorial.
This is the longest tutorial I’ve written, so I’ve decided to link it here to save space (and to separate the conversation from the implementation). The tutorial takes about 30-45 minutes to complete, assuming you’ve followed the previous posts. You can check out the full tutorial here:
Alternatively if you want to get up and running as quickly as possible and skip the construction, you can pull down the complete tutorial and follow the README instructions here:
Operating this platform comes with its own set of distinct challenges. Here’s a short list of things to keep in mind that hopefully will save you time and effort:
Operating Dockerhosts at Scale is Not “Simple”
Disk space is a big deal. Docker images eat space. Every running container eats space. Sometimes containers die and eat space. Teams will create build slaves that use volumes; these eat even more disk space.
Monitoring disk space on your Docker hosts is essential. Do not let it run out. Be proactive.
Cleaning Up Images and Containers is like Garbage Collection
Every time a new image is pushed to a Docker host, it leaves “dangling” unused layers behind. Cleaning these up has to become routine or disk space will be a problem.
Container slaves can sometimes halt or not clean up—paying attention to “exited” and “dead” containers is a necessary part of keeping a Docker host clean.
Adding new Images/Slaves to a fleet of Dockerhosts is Time Consuming
Initially, engineers had to request that their images be added to the Jenkins configuration. We soon noticed this process was an unnecessary impediment. In our experience, the average engineering team was changing their Dockerfile several times a week when in early development.
We created a tool we call “Harbormaster” that automatically verifies images engineers supply and auto-configures Jenkins via a Groovy API. Harbormaster tests each image against a set of core criteria and verifies the slave will work. It generates test reports and configures Jenkins automatically.
Discussing how Harbormaster works would've made this article excessively long. I'll be talking about it in a future blog!
Monitoring the Beast Is Essential
We’re still evolving how we monitor Jenkins and our Docker Swarm. The art of monitoring a Docker Cloud is in a constant state of evolution and takes on a unique twist when that cloud is a Build Farm and not an Application Farm.
As with Harbormaster, discussing monitoring here would've made this article far longer. I'll be talking more about monitoring all of this in production in a future blog.
Jenkins Auditing Can Bite You
Once, we were woken in the middle of the night by a Jenkins server crash due to disk space running out. Turns out, it had 100,000 tiny log files of “Create/Destroy” build slave entries. Jenkins keeps an audit log of every build slave it creates and destroys; pay attention or you’ll end up suffering death by a thousand cuts.
There are more lessons I’m sure. I’ll be talking about how to handle some of these in future posts and/or presentations so please don’t hesitate to ask questions!
Productionizing your System and our Results
With the tutorial completed, you should have a fully functional Dockerized Jenkins sandbox. There is a mountain of potential in what you just created. There are a few things you need to do to “productionize” this.
- Configure your production master Jenkins server with the Docker Plugin as you did in the tutorial above (install the plugin).
- Stand up a “production” Dockerhost. That’s a bit outside this tutorial. Our Pipeline team uses Centos VM’s running on VSphere, but you can use AWS instances, physical machines, just about any valid Dockerhost you want. That’s the power of Docker.
- If you aren’t using TLS security for your build farm Docker hosts (not uncommon in secure environments) then be sure to remove the “https” and “Docker cert path” settings.
- If you are using TLS, make sure you configure Jenkins with your production certs.
- Make sure your build slave image is somewhere your production Dockerhost can reach. At Riot we use a central Docker image repository.
Many of these things can be involved to set up, so please don’t hesitate to ask questions and provide pointers!
For us, this is not a “sandbox” or “playtime” setup. Our live environment changes only one component from what’s listed in this tutorial. Our production Dockerhost is actually a Docker Swarm app, backed by 10 Dockerhost machines. Here’s some diagrams of how we have things set up right now (at the time of publication):
Here’s some stats from our Production System:
Jenkins Stat Details
- Average Queue Size: 20-30
- Provisioned Executors: ~650
- Average Executors in Use: 30-40
- Total Build Nodes: ~80
- Avg Jobs Per Hour: ~600
Docker Jenkins Stats
Docker Jenkins Stat Details
- Average Queue Size: 3
- Provisioned Executors: 20 (for Build Flow controls)
- Average Executors in Use: 3
- Average Build Nodes at any time: 5
- Avg Jobs Per Hour: 50
We deployed this system on a much earlier version of Docker (Docker 1.2) and Docker Plugin (.8) early last year. Stability at that time was definitely a concern. I feel confident in saying that the current setup (Docker 1.10 + Docker Plugin 0.16) is very stable for our needs. In the year or so we’ve been using this we grew from a handful of build jobs and some early adopting teams to a huge range of both. Obviously we'll be looking to move to the new "Yet Another Docker Plugin" once we've had a chance to fully test everything.
In fact, our Dockerized Jenkins platform now represents nearly 25% of all the build jobs we support on our entire Jenkins platform (out almost 4000). Net new build job creation has gone to almost zero on our more traditional Jenkins environment and most new jobs are now created on our Docker platform. This is because of the following things:
- Engineering teams are in full control of their build environments by defining them as Dockerfiles.
- These “build environments” can be tested locally with ease using Docker and replicate build farm behavior wherever they are deployed.
- Teams don’t have to be “systems administrators” of build VMs or boxes, it’s “hands off” ownership.
When I started this journey of blog writing in August of 2015, we already had a functional prototype. Eight months later, the blog has almost caught up to where we are currently, sans monitoring and scaling.
By this point you should have a solid introduction to Docker basics, and know a bit about scaling Docker and securing it—all demonstrated through real-world use and application of Jenkins. On the flip side, you also should have a functional configuration of a full Jenkins test environment running against your local Docker Toolbox Dockerhost, including the ability to create custom build slaves. It’s literally “Jenkins in a
box container.” I’ll dig deeper into this ecosystem in future posts, and describe the tools, APIs, and monitoring we’ve built.
I truly hope you’ve found this useful. The feedback we’ve received has been fantastic, and I love the enthusiasm of the community. As always you can comment below and I highly appreciate it!
All of the files are available on my public Github; everything here is built with nothing but Open Source magic! Do not hesitate to create issues there, provide pull requests etc.
For more information, check out the rest of this series:
Part I: Thinking Inside the Container
Part II: Putting Jenkins in a Docker Container
Part III: Docker & Jenkins: Data That Persists
Part IV: Jenkins, Docker, Proxies, and Compose
Part V: Taking Control of Your Docker Image
Part VI: Building with Jenkins Inside an Ephemeral Docker Container (this article)
Part VII: Tutorial: Building with Jenkins Inside an Ephemeral Docker Container
Part VIII: DockerCon Talk and the Story So Far