Improving Your Docker Image Development Experience

Many CI systems allow you to build inside a Docker container, but creating the docker image which is used for the container can be a slow process. Various third party resources, like Docker Hub, may have rate limits, and fetching all the updates and new packages over the internet may take a while, but there are a few things you can do which will make building and testing your images faster.

There are three main things I do to improve my docker image development experience; Optimising my Docker file instruction order, making a caching HTTP proxy available, and having a local docker registry mirror available. I’ll explain each one below.

Optimise your Docker file instruction order

Why?

Docker creates a new disk image for each instruction in your docker file, and it will try to re-use these intermediate images when you re-run your docker file build. This means that if you have things in your docker file which are unlikely to change and can be moved further up your docker file you should move them further up.

How?

It’s not always possible to move things further up in your docker file; Sometimes you need to install a package before you can run a program, so there is a set order, but you may still be able to split some parts into multiple steps while you’re developing the image, then combine them back into a single step when you make your dockerfile available.

If, for example, you’re installing a group of packages and one set is fixed, but the other is changing as you work on your image, you might want to split that into multiple docker RUN commands so that Docker can create a cacheable image with the fixed set of packages to use as the starting point for the step which installs the package list you’re changing. This means you’ll get faster image build times because only the package list your changing needs to be re-installed in the image.

You can always combine the package installations before making the dockerfile available, so part of your development docker image might look like this;

# Install the packages needed for the build
RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y git-core gnupg flex bison build-essential \ 
	zip curl zlib1g-dev gcc-multilib g++-multilib 
RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y libc6-dev-i386 lib32ncurses5-dev
RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y x11proto-core-dev libx11-dev lib32z1-dev 
RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y libgl1-mesa-dev libxml2-utils xsltproc unzip
RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y fontconfig libncurses5 procps rsync

as you build up the list of packages to install, and then, before publishing the dockerfile, you’d combine those lines to become;

RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y git-core gnupg flex bison build-essential \
	zip curl zlib1g-dev gcc-multilib g++-multilib libc6-dev-i386 lib32ncurses5-dev \
	x11proto-core-dev libx11-dev lib32z1-dev libgl1-mesa-dev libxml2-utils xsltproc unzip \
	fontconfig libncurses5 procps rsync

The reason you’d combine them is that each image takes up some space on the machine where the image is build, and while you benefit from each line creating an extra image, your end users are unlikely to need all those images, and so you can combine those steps and save them some space.

Set up a local caching HTTP proxy

Why?

Both Debian and Ubuntu use apt, and their default configuration fetches packages via unsecured http://. There’s a lot of debate about whether this is good or bad, but, from the perspective of making your Docker image development quick, this is a good thing. If you’re using a different distribution you should check if it’s using http (not https) to fetch packages, if it is a local caching HTTP proxy should speed up your builds too.

When a file is fetched via https:// it’s encrypted, and the encryption means that for any given file, for two separate requests, the bytes served won’t be the same. The differences mean it can’t be cached without some nasty tricks which make your environment less secure. When a file is fetched via http:// the data is unencrypted, and so the same set of bytes are sent for each request, and so caching is possible in a low-risk way.

Having downloads cached locally means your download speed is limited by your local network and hardware. You’re not pulling data from the original server, all you’re doing is pulling data from your local cache to your machine, so any Internet speed fluctuations won’t come into play. An added benefit is you’re less likely to be rate limited because you’re not pulling as much data from the original source server.

HTTP caching pays off from the second build onwards. The first build will ask the local cache for the file, then the local cache will ask the original source server (because it doesn’t have the file locally). The second build will ask the local cache and, because it now has the file, it will serve the file over your local network. Eventually the file may need to be updated, so the cache will go back to the original server, but, during development of a docker image, where you’re rebuilding your image a lot, you’ll get a lot cache hits if you’re changing a part of the image which will trigger a new set of package installations.

Added bonus

If you have a number of Linux machines which are using the same apt based distribution and version you can use the proxy with those machines as well. This will give you faster updates on your machines as well as speeding up your docker builds.

Installing the proxy

The machine you run your cache on doesn’t have to be a mega-machine; If it’s just you building an image on your own then you could the machine you’re working on or a Raspberry Pi. I have an old Core i5-2320 machine with 16 GB of RAM and a 2 TB hard disk (yes, HDD, not an SDD). which serves me and a few test machines on my LAN, so don’t feel you need to spend out hundreds of pounds on a high-end machine with super-fast SSDs (although if you want to then don’t let me hold you back :)).

I use Debian 10 for all the Linux machines I run. Ubuntu and Raspbian are based on Debian, so these instructions should work for them as well. I also use Squid as my caching proxy because it’s well tested and easy to configure for our needs. Neither are hard requirements, so if you have your own preferences for Linux distribution and caching HTTP proxy feel free to use them.

To install squid you run the following command as root;

$ apt install squid 

Then you’ll need to change a couple of lines in the configuration file /etc/squid/squid.conf so that squid can cope with the size of Linux packages.

The first change creates an on-disk cache of 10 GB, so we’re not completely reliant on RAM based caching;

# cache_dir ufs /var/spool/squid 100 16 256

should become;

cache_dir ufs /var/spool/squid 102400 16 256

The next change increases the amount of RAM the cache uses. As I mentioned above my cache machine has 16 GB of memory, so I set aside 4 GB of RAM for cached objects. If you’re using a machine with a smaller amount of RAM you may want to make this value lower than 4 GB;

# cache_mem 256 KB

becomes

cache_mem 4 GB

The next pair of changes increase the maximum size of an object the cache will store. Linux packages can be multiple-megabytes, so for both the disk and in-memory caches I increase this to 50 MB;

# maximum_object_size 4 MB

should become

maximum_object_size 50 MB

and

# maximum_object_size_in_memory 512 KB

should become;

maximum_object_size_in_memory 50 MB

The final change allows all the machines on your local network to use the caching proxy, without this only the machine you installed squid on will be able to use the cache;

#http_access allow localnet

should become

http_access allow localnet

Using the proxy

Ubuntu and Debian can be configured to use an HTTP proxy for package updates and installations by setting the http_proxy environment variable. This makes it very easy to use the proxy in your build, all you need to do is add the following line after the FROM line in your docker file;

ARG http_proxy

Now you can add a build-arg to your docker build command which tells points the build at your proxy like this;

$ docker build --build-arg http_proxy=http://{squid_host_ip_address}:3128/ .

(The :3128 refers to the default port that squid runs on.)

Now you’ve made these changes you should, when you perform builds, see the squid cache being used by looking at /var/log/squid/access.log on the machine running squid.

What should you expect?

The squid statistics for two consecutive builds of a docker file I have which installs ~240 packages are;

Cache information for squid:
        Hits as % of all requests:      5min: 50.0%, 60min: 50.0%
        Hits as % of bytes sent:        5min: 50.0%, 60min: 50.0%

A hit rate 50% means that the package updates for the second build were served entirely, or almost entirely, from the cache. After a third build outside the 5-minute window you can see the trend continuing;

Cache information for squid:
        Hits as % of all requests:      5min: 98.5%, 60min: 66.2%
        Hits as % of bytes sent:        5min: 99.9%, 60min: 66.6%

As for the build speed, the original package install build steps were reported as;

 => [ 4/17] RUN DEBIAN_FRONTEND="noninteractive" apt-get update                                                    4.6s
 => [ 5/17] RUN DEBIAN_FRONTEND="noninteractive" apt-get upgrade -y                                               12.1s
 => [ 6/17] RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y apt-utils                                      2.1s
 => [ 7/17] RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y git-core gnupg flex bison build-essential  z  70.4s

by the third build we’re getting;

 => [ 4/17] RUN DEBIAN_FRONTEND="noninteractive" apt-get update                                                    3.7s
 => [ 5/17] RUN DEBIAN_FRONTEND="noninteractive" apt-get upgrade -y                                               11.5s
 => [ 6/17] RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y apt-utils                                      2.1s
 => [ 7/17] RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y git-core gnupg flex bison build-essential  z  56.7s

which is a 20% improvement on my 80Mb/s down/20 Mb/s up internet connection.

Setting up a local registry mirror

As mentioned earlier Docker Hub, which is where many folk pull base docker images from, has some rate limits. Docker does a great job of caching images locally, so if you’re only building on one machine, and you’re sticking with a single base version of Linux in your FROM command, then installing a local registry isn’t really necessary.

If, however, you have a few machines, and/or you’re playing with lots of base Linux versions, then you could hit the rate limit. This is where a local caching docker registry can come in useful.

The docker folk have made it very easy to get a registry up and running. They have some great instructions which will get you started, but there is one tweak you’ll need to make to allow base images to be cached, and a second I like to make to control where the cached images are stored.

Firstly, to configure the proxy functionality, you need to change the default configuration. Without this you’ll only get local images served from your registry, which is not what we want.

To do this you need to create a config.yml file which looks like this;

version: 0.1
log:
  fields:
    service: registry
storage:
  cache:
    blobdescriptor: inmemory
  filesystem:
    rootdirectory: /var/lib/registry
http:
  addr: :5000
  headers:
    X-Content-Type-Options: [nosniff]
health:
  storagedriver:
    enabled: true
    interval: 10s
    threshold: 3
proxy:
  remoteurl: https://registry-1.docker.io

Secondly I like to store the images on a Hard Disk rather than an SSD. Docker images can become large, and there aren’t many concurrent requests for them in my environment, so I prefer to put them on the hard disk of my local registry because HDDs' are still cheaper than SDDs for bulk storage.

My HDD is mounted on /hdd on my machine, and I keep my docker related files in /hdd/docker. config.yml is located at /hdd/docker/etc/config.yml, and I use /hdd/docker/registry for image storage. To start the registry in a way which recognises this I use the command;

docker run -d \
  -p 5000:5000 \
  --restart=always \
  --name registry \
  -v /hdd/docker/etc/config.yml:/etc/docker/registry/config.yml \
  -v /hdd/docker/registry:/var/lib/registry \
  registry:2

which maps, using dockers -v option, my config file and storage area into the registry image.

Once you’ve run that command your registry is now up and running.

Using the local registry

To use the local registry mirror you’ll need to configure the docker installations where you’re performing the build. On Linux you’ll need to create the file ‘/etc/docker/daemon.json’ which contains the following;

{
  "registry-mirrors": [
    "http://{your_registry_ip_address}:5000/"
  ],
  "insecure-registries": [
    "{your_registry_ip_address}:5000"
  ],
}

where {your_registry_ip_address} is the IP address of the machine you’re running the docker registry on. You’ll need to include the insecure-registries option because we’re not using an SSL certificate to serve the images. If you want to expose this cache outside a trusted network you should read the Docker documentation titled “Run an externally-accessible registry” so you can ensure you’ve secured your image cache.

That’s all folks

Doing these three things should improve your docker image development experience. If you have limited internet bandwidth the cache and local registry should really help, and if you have great bandwidth you’ll notice a smaller improvement, but you’ll still be helping those folk who maintain package and image repositories by lightening the load on them.

If you have questions you can find me on Mastodon, GitHub, and Twitter.