Using Multi-Stage Build to Reduce Docker Image Size

Docker on the Producing Line

The multi-stage supports in docker image building was introduced with Docker v17.05 in 2017. This post summarizes the practical points which can benefit the development experience, secure the data and reduce the docker image size.

Multi-Stage in Docker Image Building

The multi-stage docker image build, in my practices, shows a way to resolve thress issues.

  • Data Security Supports: If there are previous steps to download the source, setting up the toolchain then there is a risk to leak information via incomplete deletion or to introudce more vulnerability by leaving the artifacts building toolchain on product images. Some cloud solution vendor offers special solution to build artifacts with homogeneous images and only delivery final artifacts in last image.

  • Reducing Docker Image Size: Docker image build generate new layer on each command and the AUFS applies Lazy Deletion. If caches and temporary files are removed in second command, the size wouldn't be reduced from volume but just those files are marked as deleted in the new layer. As pointed by many Dockerfile Best Practice or guidelines, there are recommended tricky steps to keep the dotnet build, yun install or apt-get install followed by purges. Multi-stage build could resolve it by copying artifacts from another stage.

  • Easy to Maintain Dockerfile: The above two issues could be mitigated with well configured multiple images in a procedure to deliver the final artifacts only in last image. However, the Dockerfiles would have dependencies and the Dockerfile would be hard to maintain.

Multi-stage build was introduced to divide the docker image build into multiple stages which can pass artifacts from one to another and eventually ship the final artifacts in the last stage.

Examples to Reduce Docker Image

Take an example of upgrading googl-chrome browser version. The base image is cypress/browers.

Upgrade Google-Chrome without Purging the Cache

The Dockefile is straight through:

FROM cypress/browsers:node11.13.0-chrome73

ENV TZ=Pacific/Auckland

RUN apt-get update && apt-get install google-chrome-stable -y && \
google-chrome --version

From the logs, it shows chrome browser v78 replaced original v73. To check the image size, either docker images with labels/tags to show a summary on matched images or docker inspect command can show image details.

Then docker inspect cypress3-chrome-updated-without-purge | jq '.[0].Size' would show the image szie 1520046216 in Bytes. Alternatively, the docker command native JSON filer could be applied to get the same result on given image docker inspect cypress3-chrome-updated-without-purge --format='\{\{.Size\}\}'.

Upgrade Google-Chrome and Purge the Cache Immediately

Apply the recommended hacks to clean the cache on every command:

FROM cypress/browsers:node11.13.0-chrome73

ENV TZ=Pacific/Auckland

RUN apt-get update && apt-get install google-chrome-stable \
-y --no-install-recommends && \
rm -rf /var/lib/apt/lists/* && \
google-chrome --version

This way the image size is reduced to 1503569359 Bytes. 200MB caches are removed from the same layer to upgrade chrome browser.

> docker inspect cypress3-chrome-updated-cache-purged  --format='{{.Size}}'
> 1503569359

Obviously the Dockerfile is a bit harder to maintain because each step was appended with all kinds of purge commands. If there is no convenient way to purge right away or it is difficult to maintain such code in one command, a script might be drafted and copied to the intermediate layers to support such a command in one step.

Generate the Same Image in Multi-Stage Build

With a quick check, the google chrome is maintained in /opt/google/chrome folder and as an image for experiments, it is okay not to consider apt-get checksums. The new Dockerfile is drafted as below:

FROM cypress/browsers:node11.13.0-chrome73 as stage1

ENV TZ=Pacific/Auckland

RUN apt-get update && apt-get install google-chrome-stable \
-y --no-install-recommends && \
rm -rf /var/lib/apt/lists/* && \
google-chrome --version

FROM cypress/browsers:node11.13.0-chrome73
COPY --from=stage1 /opt/google/chrome /opt/google/chrome
RUN google-chrome --version

The first image is also homogeneous and it just contribute the google-chrome binary files. Then the final image copied the binaries directly to corresponding folder.

Test the google-chrome version in cli.

> docker run -it cypress3-chrome-updated-multi-stages google-chrome --version
> Google Chrome 78.0.3904.108

Check the image size and it shows even a smaller size than that from Dockerfile to purge apt-get system caches because this solution only copies the required folder.

docker inspect cypress3-chrome-updated-multi-stages --format='\{\{\.Size\}\}' reports size as 1501127204 Bytes.

Summary

  • Less information left on image: No need to keep addition YUM repos if it is an RHEL image, no extra keys left, more important, no development phase configuration or source code left on image.

  • Smaller size: Since copying the artifacts is the clean way to add only requested files to final image, the size is only increased for neccessary.

Building way Size
Install pkg from apt-get 1520046216
Install pkg and purge 1503569359
Copying binaries from previous stage 1501127204

Further Discussion

A better chance to apply multi-stage docker image building is to support multi-stage compilation. One typical example is to upgrade git version on an RHEL Jenkins Slave Image. RHEL official YUM repo only supplies the old version of git client. Which doesn't support the advanced functions as Dotnet Core NuGet operations. In this case, the solution is to download git source code and install gcc toolchain to build it locally. Without multi-stage image build, the procedure would request a cross compilation on source code in separate script or build it on docker image directly for homogeneous arch. Multi-stage docker image build can maintain the steps in one single Dockerfile.

On the other side, the sample in this post is not an apt example. If only the chrome binary executable files under /opt/google/chrome are updated staightly, the /etc/alternative would still point to chrome-stable binary but the apt pkg management DB still regard it as the original version v73, not the current version and the dependencies check won't cover v78 neither. Like Sun Solaris package system, it is possible to overwrite the package DB but which would request one more command and consequently a new docker image layer. The apt package DB is located at /var/lib/apt/lists.

So apply multi-stage image build for source code compilation especially multi-stage compilation, decompressed binary package as Node.js.

(TBC)