Docker & Jenkins: Data that Persists

In the last blog post we discussed taking more control of our Jenkins Docker image by wrapping the Cloudbees image with our own Dockerfile. This empowered us to set some basic defaults that we previously passed in every time we ran `docker run`. We also took the opportunity to define where to place Jenkins logs and how to use Docker exec to poke around our running container.

We left off with my thoughts that we still needed some kind of data persistence to really make this useful. Containers—and their data—are ephemeral, so we’re still losing all of our Jenkins plugin and job data every time we restart the container. The Cloudbees documentation even tells us that we’re going to need to use volumes to preserve data. They recommend a quick way to store data on your Docker Host, outside of your running containers, by mounting a local host folder. This is a traditional method of persisting information that requires your Dockerhost to be a mount point.

There is another way, however, and that’s to use a Docker data volume to containerize your storage. You can read up about Data Volumes in Dockers documentation here.

In this blog post we’ll cover the following subjects:

  • Persisting Docker data with volumes
  • Making a data volume container
  • Sharing data in volumes with other containers
  • Preserving Jenkins job and plugin

Host Mounted Volumes vs Data Volume Containers

When I refer to a “Host Mounted Volume” I am referencing the idea that your Docker host machine stores the data in its file system and when you start a container with `docker run` Docker mounts that physical storage into the container.

This approach has many advantages, the most obvious one being its ease of use. In more complex environments, your data could actually be network or serially attached storage, giving you a lot of space and performance.

It also has a drawback—it requires that you pre-configure the mount point on your Dockerhost. This eliminates two of Docker’s bigger advantages, namely container portability and applications that can “run anywhere.” If you want a truly portable Docker container that can run on any host, you can’t have any expectations of how that host is configured when you make your `docker run` call.

This is where data volumes can help. A data volume is essentially a Docker image that defines storage space. The container itself just defines a place inside Docker's virtual file system where data is stored. The container doesn’t run a process and in fact “stops” immediately after `docker run` is called—as the container exists in a stopped state, so too does its data.

This method allows Docker containers to share data without the requirement that the host be configured with a proper mount point. Users can interact with the containers via Docker commands and never need to touch the host.

There are drawbacks. Performance is a tad slower as you’re maintaining data through Docker’s virtualized file system, so it may not be ideal for applications that need the very best in I/O performance. For most apps, however, this won’t be noticeable. Complexity is also increased as your application now has a minimum of two images (meaning two Dockerfiles)—one for the app and one for storage.

To be clear, either approach is 100% valid and really depends on how exactly you want to work. My own opinion is that applications should be as independent as possible and for the purposes of this article I’ll be showing how to use data volume containers.

Getting Started

We’ll start with the Dockerfile we ended up with from the last blog post. For reference here it is:

FROM jenkins:1.609.1
MAINTAINER Maxfield Stewart

USER root
RUN mkdir /var/log/jenkins
RUN chown -R jenkins:jenkins /var/log/jenkins
USER jenkins

ENV JAVA_OPTS="-Xmx8192m"
ENV JENKINS_OPTS="--handlerCountStartup=100 --handlerCountMax=300 --logfile=/var/log/jenkins/jenkins.log"

For starters, let’s create an image that will just host our log files. This will be an app with two Dockerfiles. We could do this in one root directory but I like to have each Dockerfile inside it’s own directory:

mkdir jenkins-master
mkdir jenkins-data

Place our original Jenkins Dockerfile inside jenkins-master:

mv Dockerfile jenkins-master

​To be sure it still works, let’s build the jenkins-master Dockerfile:

docker build -t myjenkins jenkins-master/.

Note how you can use the Dockerfile from a folder by providing the relative path. This will be useful later when managing multiple Dockerfiles inside a single root directory. Now let’s create a new Dockerfile in jenkins-data:

  1. Use your favorite editor to create the file “Dockerfile” inside the “jenkins-data” directory
  2. Add the following lines at the top:
    • FROM debian:jessie
      MAINTAINER yourname
    • NOTE: I use the base Debian image because it matches the same base image the Cloudbees Jenkins image uses. Because we’ll be sharing file systems and UID’s across containers we need to match the operating systems.
  3. Now create the Jenkins user in this container by adding:
    • RUN useradd -d "/var/jenkins_home" -u 1000 -m -s /bin/bash jenkins
    • NOTE: we set the UID here to the same one the Cloudbees Jenkins image uses so we can match UIDs across containers, which is essential if you want to preserve file permissions between the containers. We also use the same home directory and bash settings.
  4. We need to recreate the Jenkins log directory in this image because it'll be the new foundation, so add the following lines:
    • RUN mkdir -p /var/log/jenkins
      RUN chown -R jenkins:jenkins /var/log/jenkins
  5. Now we can add the Docker volume magic. Let’s make the log folder a mount:
    • VOLUME ["/var/log/jenkins"]
  6. Let’s set the user of this container to “jenkins” for consistency, so add:
    • USER jenkins
  7. Lastly, while these images don’t actually “run” anything, I like to have them output a message when they start as a reminder to their purpose, so toss in the final line:
    • CMD ["echo", "Data container for Jenkins"]

That should do it! For reference here’s the entire new Dockerfile in one shot:

FROM debian:jessie
MAINTAINER yourname

RUN useradd -d "/var/jenkins_home" -u 1000 -m -s /bin/bash jenkins
RUN mkdir -p /var/log/jenkins
RUN chown -R jenkins:jenkins /var/log/jenkins

VOLUME ["/var/log/jenkins"]

USER jenkins

CMD ["echo", "Data container for Jenkins"]

Go ahead and save the file then build it:

docker build -t myjenkinsdata jenkins-data/.

Your base image now exists for your Jenkins data volume. However, we need to adapt our existing image to make use of it!

Preparing the Data Volume

First let’s go ahead and start the new data volume.

docker run --name=jenkins-data myjenkinsdata

You’ll note that you get the output message we added to the CMD file. If you run:

docker ps

You’ll see that there are no running containers. And if you run

docker ps -a

You should see our new data container stopped. That’s okay: this is how data volume containers work. So long as that container is there, your data in /var/log/jenkins will be preserved because we defined that as a volume point. With it in place, we can now instruct our Jenkins (master) container to use it and be able to preserve our logs even if we remove our Jenkins (master) container.

Using the Data Volume

This part is easy. All the hard work went into setting up the data volume. To make use of it we just need to add the “volumes-from” directive to our `docker run` call like so:

docker run -p 8080:8080 -p 50000:50000 --name=jenkins-master --volumes-from=jenkins-data -d myjenkins

You'll can see above that I've added a new port mapping, for port 50000. This is to handle connections from JNLP based build slaves. I'll talk about this more in a future blog post but wanted to include it here in case you start using this as the basis for your own Jenkins server.

Note that we just used the handy “jenkins-data” name we gave the container. Docker is smart enough to reference those names. You can verify everything still works by tailing the log file again:

docker exec jenkins-master tail -f /var/log/jenkins/jenkins.log

But how do we know the volume mount works? Easy, because by default Jenkins appends to its log file—a simple start/clean/restart can prove it:

docker stop jenkins-master
docker rm jenkins-master
docker run -p 8080:8080 -p 50000:50000 --name=jenkins-master --volumes-from=jenkins-data -d myjenkins
docker exec jenkins-master cat /var/log/jenkins/jenkins.log

You should see the first, then second Jenkins startup messages in the log. Jenkins can now crash, or be upgraded, and we’ll always have the old log. Of course this means you have to cleanup that log and log directory as you see fit just like you would on a regular Jenkins host.

Don’t forget about docker cp. You can just copy the log file out of the data volume container even if you lose the master container:

docker cp jenkins-data:/var/log/jenkins/jenkins.log jenkins.log

Preserving log data is just a minor advantage—we really did this to be able to save key Jenkins data, like plugins and jobs, between container restarts. Using the log file was a good way to demonstrate how things were working simply.

Saving Jenkins Home Dir

First, let’s add the Jenkins Home directory to the data volume. Edit the Dockerfile in jenkins-data/Dockerfile and update the VOLUME command:

VOLUME ["/var/log/jenkins", "/var/jenkins_home"]

Because the folder is already owned and created by the Jenkins user we don’t need to do anything except add it as a container mount point. Don’t forget to rebuild your new data image and cleanup the old one before restarting it:

docker rm jenkins-data
docker build -t myjenkinsdata jenkins-data/.
docker run --name=jenkins-data myjenkinsdata

Before we use this though, there’s one annoyance with the default Cloudbee’s Docker image. By default, it stores the uncompressed Jenkins war file in jenkins_home, which means we’d preserve this data between Jenkins runs. This is not ideal as we don’t need to save this data and it could cause confusion when moving between Jenkins versions. So let’s use another Jenkins startup option to move it to /var/cache/jenkins.

Edit the Jenkins-Master Dockerfile and update the JENKINS_OPTS line to:

ENV JENKINS_OPTS="--handlerCountStartup=100 --handlerCountMax=300 --logfile=/var/log/jenkins/jenkins.log --webroot=/var/cache/jenkins/war"

This sets the Jenkins webroot. However, we now need to make sure this directory exists and is permissioned to the Jenkins user, so update the section where we create the log directory to look like this:

USER root
RUN mkdir /var/log/jenkins
RUN mkdir /var/cache/jenkins
RUN chown -R jenkins:jenkins /var/log/jenkins
RUN chown -R jenkins:jenkins /var/cache/jenkins
USER jenkins

Save your Dockerfile, rebuild your jenkins-master image, and restart it. Please note the use of “rm -v” below. Now that we’re playing with volumes, we need to remove the data volumes when we’re done using them. Docker doesn’t clean them up by default because you might want to keep them in case of emergency.

docker stop jenkins-master
docker rm -v jenkins-master
docker build -t myjenkins jenkins-master/.
docker run -p 8080:8080 -p 50000:50000 --name=jenkins-master --volumes-from=jenkins-data -d myjenkins

Your container should start. You can confirm we effectively moved the WAR file correctly by running the following command:

docker exec jenkins-master ls /var/cache/jenkins/war

You should see the uncompressed contents there. But how do we know this fancy new layout saves Jenkins data?

Testing You Can Keep Jobs Between Runs

We can perform this test easily. With your jenkins-master container running, let’s go make a new Jenkins build job!

  1. Point your browser to: http://yourdockermachineip:8080 (grab with 'docker-machine ip default')
  2. Create a new job by clicking: “New Item”
  3. Enter: “testjob” for the item name
  4. Choose: Freestyle software project
  5. Click “ok”
  6. Click “save”

Our “useless for anything but testing” new job should show up on the master job list. Now stop and remove your Jenkins container.

docker stop jenkins-master
docker rm jenkins-master

Notice that we correctly didn't use "-v" here. We only want to use "-v" when we want to totally remove the data volume.  Remember, data volumes work like pointers. What Docker is actually doing is creating a virtual file system on disk—as long as one currently defined container references it, it will exist.  When jenkins-master is removed here, the jenkins-data volume still references the virtual file system. If we were to use "-v" here, Docker would take that as an override and delete that virtual file system. This would break the reference jenkins-data has to it, and things would get very ugly.  

In the old image we had, this would’ve also deleted our job. When we recreate the container however:

  1. docker run -p 8080:8080 -p 50000:50000 --name=jenkins-master --volumes-from=jenkins-data -d myjenkins
  2. Refresh your browser at http://yourdockermachineip:8080 and wait for Jenkins to start

We’ll find our test job is still there. Mission accomplished!

Conclusion

As with the previous blog posts, you can find updates and example files from this post on my Github repository. You’ll note the makefile has once again been updated and includes a new “clean-data” command if you want to wipe out your data container intentionally.

At this point we have a fully functioning Jenkins image. We can save our logs, jobs, and plugins because we placed jenkins_home in a data volume container so it persists between container runs. As a side bonus, it will even persist if the Docker daemon crashes, or the host restarts, because Docker preserves stopped containers.

While we could just start using this set up, in practice there are still some things that could stand to be improved. Here’s the short list:

  • We’d like to proxy a web server like NGINX in front of our Jenkins container.
  • Managing multiple images and containers is starting to get annoying, even with a makefile. Is there an easier way?
  • We need a way to backup our Jenkins environment, especially jobs.
  • What if we don’t want to use Debian as our base OS? What if we don’t like relying on external images?
  • We haven’t done anything about build slaves. While this set up will allow any standard slave to connect, wouldn’t it be awesome if we could set up build slaves as Docker containers?

Each one of these is basically its own blog post. Up next we’re going to get a web proxy set up and after that discuss dealing with having three containers: that means introducing Docker Compose. The other subjects such as build environments in containers, changing our base OS, and backing up Jenkins will be further out. Stay tuned!

    Posted by Maxfield F Stewart