Keeping an eye on your Docker image size

Sep 23, 2019

7 min read

docker

size

tutorial

Mirco Zeiss

Docker is a great tool and a major player in the current cloud universe. It allows running multiple applications, processes and services at the same time on the same machine in an isolated fashion. To define your individual applications you use a Dockerfile. Within the Dockerfile you describe the prerequisites like the operating system and programming language. We also specify which steps need to be taken to actually run our application. Learn how to continuously keep an eye on your Docker image size during development and in production.

You can find the demo project on GitHub github.com/seriesci/dockersize. Here is how our folder structure looks like.

$ tree -h
.
├── [ 160]  Dockerfile
├── [2.8M]  image.jpg
├── [  68]  main.go
└── [  38]  README.md

0 directories, 4 files

First of all let us start with a basic Dockerfile. We are going to write a simple Go application that writes hello world to stdout. Here is how the Go code looks like.

// main.go
package main

import "fmt"

func main() {
	fmt.Println("hello world")
}

The Dockerfile uses golang:1.13 as its base and copies the source code from our host to the image. Afterwards we compile our Go source code which results in a binary called app. At the end we simply start the binary.

# Dockerfile
FROM golang:1.13
RUN apt-get update -y
RUN apt-get upgrade -y
RUN rm -rf /var/lib/apt/lists/*
WORKDIR /src
COPY . .
RUN go build -o app .
CMD ["./app"]

Having this Dockerfile we can now build the image and give it a name. You must run the following command in the same directory as your Dockerfile.

$ docker build -t seriesci/dockersize .
...
...
...
Successfully tagged seriesci/dockersize:latest

Start the container using our previously built image and check the outcome. We should see hello world on the command line.

$ docker run --rm seriesci/dockersize
hello world

We got what we expected and everything seems to work just fine. Let us check our Docker image size.

$ docker images --format "{{.Repository}} {{.Size}}" | grep seriesci/dockersize
seriesci/dockersize 832MB

The resulting image size is a whopping 832MB. We definitely have to make this smaller. Luckily we have multiple options:

Use less layers
Use a small base image
Use a .dockerignore file
Use multi stage builds

1. Use less layers

Layers are files generated from instructions. Each layer is only a set of differences from the layer before it. All of the layers are stacked on top of each other. In newer versions of Docker only the instructions RUN, COPY and ADD create layers. All other instructions do not increase the size of your build. Those layers are stored in the Docker cache and can be used across several images. So instead of using three instructions we only use one RUN command.

# Dockerfile
FROM golang:1.13
RUN apt-get update -y && \    apt-get upgrade -y && \    rm -rf /var/lib/apt/lists/*WORKDIR /src
COPY . .
RUN go build -o app .
CMD ["./app"]

Combining multiple layers into a single layer and also clearing the apt cache properly reduces the image size to 809MB. That is 98% of the original size.

2. Use a small base image

Our base image golang:1.13 is itself based on Debian. Instead of using this one we are going to use golang:1.13-alpine. It is based on the popular Alpine Linux project. Alpine Linux is much smaller than most distribution base images (~5MB), and thus leads to much slimmer images in general. Since apt is not part of the image we have to remove those commands as well.

# Dockerfile
FROM golang:1.13-alpineWORKDIR /src
COPY . .
RUN go build -o app .
CMD ["./app"]

Now build the image and compare the final size. We are at 361MB. That is already less than half (44%) the size of what we had before. For such a small change we got a huge reduction in size. Let us continue shrinking it even more.

3. Use a `.dockerignore` file

# .dockerignore
image.jpg
README.md
.circleci/

In our case we only have an image and our README.md. That is just 3MB but imagine you are building a frontend or backend JavaScript app and having a huge node_modules folder. You do not want your dependencies to end up in the image. You would want to build your bundle on your host machine and only copy the outcome into your image. Here the .dockerignore file comes in handy.

4. Use multi stage builds

Up until now everything takes place within a single image. We build our application inside our image and therefore need the whole Go toolchain. However Go creates a stand-alone binary and we do not need the full toolchain in our final image. It would be great if we could compile our application and then use a different image to run our app. That is possible with multi stage builds.

FROM golang:1.13-alpine AS builderWORKDIR /src
COPY . .
RUN go build -o app .

FROM alpine:latestWORKDIR /srcCOPY --from=builder /src/ .CMD ["./app"]

First of all we name our already existing image builder. That makes it easier to reference it later one. Then we build our app like in the steps before. When we have the final binary we start with a second image. It is also based on alpine but does not include the Go toolchain. We then copy the binary from our first image builder and start the app.

Result after optimizations

Let us check our final image size with the same command we used at the beginning.

$ docker images --format "{{.Repository}} {{.Size}}" | grep dockersize
seriesci/dockersize 10.5MB

As you can see it took a little bit of effort but our image is significantly smaller. Having spent so much time and energy making sure our final Docker image is as small as possible we do not want to change it in the future. That is why we will set up Continuous Integration (CI) to keep an eye on it. In this case we are using CircleCI but the workflow is similar for Travis CI, GitHub Actions, Jenkins or any tool you are using for CI. By using awk '{print $2+0}' we are getting the final image size without the unit MB. $2+0 tells awk to interpret the result as a number and neglect any following strings.

$ docker images --format "{{.Repository}} {{.Size}}" | grep dockersize | awk '{print $2+0}'
10.5

We are using this value and pipe it into the curl command.

- run:
    name: POST image size to seriesci
    command: |
      docker images --format "{{.Repository}} {{.Size}}" | grep dockersize | awk '{print $2+0}' | xargs -I {} curl \
          --header "Authorization: Token ${TOKEN}" \
          --data value="{}" \
          --data sha="${CIRCLE_SHA1}" \
          https://seriesci.com/api/repos/seriesci/dockersize/size/values

Conclusion

We have brought our initial Docker image of originally 832MB down to 10.5MB . This is only 1.3% of its original size or a reduction by 98.7%. It is easy to get started using Docker but make sure you apply some basic principles like using less layers, a small base image, a .dockerignore file and multi stage builds to keep your images small. If you want to automatically and continuously monitor your different Docker images think about adding seriesci to your workflow. It is free for open source projects and easy to integrate.

Check out the demo repository at GitHub github.com/seriesci/dockersize and the live values at seriesci seriesci.com/seriesci/dockersize.

Mirco Zeiss is the CEO and founder of seriesci.

Monitoring benchmarks over time in Go Check in your node_modules folder