Well-made Docker images are the foundation for deploying secure and scalable Docker-based applications. Building quality images also improves image re-usability, readability, and maintainability. Here are some best practices you should follow while building Docker images.

1. Cache image layers

Docker storage drivers use a COW (copy-on-write) filesystem to save disk space for images and future containers. Each command in Dockerfile creates a new layer. Every layer makes the filesystem changes between the states before and after the command is executed. Docker uses a technique of layer caching, which optimizes and speeds up the process of building the image.

This caching feature mainly works on RUN, COPY and ADD commands. If the Dockerfile and related files are unchanged, the existing layer in the local mirror cache can be used to rebuild the image. Let's quickly understand the workflow of layer caching.

FROM python:3.8-alpine
COPY ./requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . .
ENTRYPOINT ["python", "app.py"]

Running the Dockerfile for the first time, you get the following output:

Sending build context to Docker daemon  4.096kB
Step 1/5 : FROM python:3.8-alpine
 ---> 474c96543250
Step 2/5 : COPY ./requirements.txt requirements.txt
 ---> 28c77af68bd7
Step 3/5 : RUN pip install -r requirements.txt
 ---> Running in 2251632f122c
Collecting Flask==2.0.2
  Downloading Flask-2.0.2-py3-none-any.whl (95 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 95.2/95.2 KB 1.5 MB/s eta 0:00:00
Installing collected packages: Flask
Successfully installed Flask-2.0.2
Removing intermediate container 2251632f122c
 ---> b4b25857a347
Step 4/5 : COPY . .
 ---> b3895300e604
Step 5/5 : ENTRYPOINT ["python", "app.py"]
 ---> Running in 2d4f19d0302d
Removing intermediate container 2d4f19d0302d
 ---> 0bb229d4e965
Successfully built 0bb229d4e965
Successfully tagged sample:latest

real    0m4.849s
user    0m0.040s
sys     0m0.036s

Did you notice that it took ~5 seconds to build the image for the first time? Now let’s make some additional changes in app.py file and build it again.

Here is the result:

Sending build context to Docker daemon  4.096kB
Step 1/5 : FROM python:3.8-alpine
 ---> 474c96543250
Step 2/5 : COPY ./requirements.txt requirements.txt
 ---> Using cache
 ---> 28c77af68bd7
Step 3/5 : RUN pip install -r requirements.txt
 ---> Using cache
 ---> b4b25857a347
Step 4/5 : COPY . .
 ---> 4d35850f7d71
Step 5/5 : ENTRYPOINT ["python", "app.py"]
 ---> Running in 859652485e62
Removing intermediate container 859652485e62
 ---> 3f33c2ba2aa9
Successfully built 3f33c2ba2aa9
Successfully tagged sample:latest

real    0m0.398s
user    0m0.024s
sys     0m0.040s

Notice how fast the build is? You can see Using cache in steps 1-3, and steps 4 and 5 are built again due to changes in the app.py file. You can use this to save a lot of build time and save even more if the layers between multiple images are common since the other images would simply use these cached layers.

2. Use multistage builds

The multistage build feature (introduced in Docker 17.05) is useful to anyone who struggles to optimize Dockerfiles or wants to build an efficient Docker image. It gives you more control over files and artifacts, which makes it possible to protect them from vulnerabilities.

The multistage build is basically multiple FROM commands in the Dockerfile. Each command is a new build stage that can COPY artifacts from the previous stages. Copying the build artifacts from the previous stage eliminates the intermediate steps that create an additional layer for each step, such as installing dependencies, downloading additional packages, and testing.

With multistage builds, our entire build system can be contained in a single file.

# Stage 1
FROM maven AS build
WORKDIR /app
COPY . .
RUN mvn package
# Stage 2
FROM tomcat
COPY --from=build /app/target/file.war /usr/local/tomcat/webapps/

The above dockerfile has two FROM commands. These commands are numbered as stage 0 and stage 1 internally. Stage 0 is given a friendly alias as build. This stage builds the maven and stores it in the app directory as mentioned in the WORKDIR command. The resulting image size is 635 MB.

The second stage pulls the official tomcat image from the Dockerhub. Then the COPY --from command is used to copy only the app-related files from the previous stage build. As a result, the final image size is ~260 MB.

3. Pack a single app per container

A container can run different things simultaneously, but then we won’t be taking full advantage of the container model. For example, take the classic Apache/MySQL/PHP stack. You may want to run all the components in a single container. But the best practice is to use separate containers for each service. This makes it easier to scale up the service. Also, containers can be reused in multiple environments. 

4. Use the smallest base image

When building a Docker image, you should always strive for smaller images, which offer advantages such as faster upload and download times. By using smaller images as the base, you can avoid downloading unnecessary packages. However, building a small image as a base is a challenge because you might unexpectedly include build dependencies or unoptimized layers. Also, keep in mind that images should be trustable.

A different variation of an operating system can be used as the base of an image. Compared to other OS images, the Alpine is much smaller in size. For example, the OS image of Ubuntu is 188 MB while Alpine is only 5 MB. 

We can even use scratch as the base image to build our own runtime environment. The scratch image is actually an empty image, but you can’t use it in all cases. It helps if your application is statically compiled within the binary. Go for it only while building a minimal image.

FROM scratch
COPY mybinary /mybinary
CMD [ "/mybinary" ]

5. Pin software versions

Pinning versions only takes some extra seconds, but it will save you a lot of time in the future. This includes the base images, the code you pull from repositories, the libraries your code relies on, and so on. With versioning, you can have a consistent build of the application. Without it, the components would change such that a previously working Dockerfile does not build anymore. For example:

FROM jenkins/jenkins:2.235.4-lts-slim

6. Tag Docker image

Docker images are generally identified by two components: name and tag. For example, for the image google/cloud-sdk:193.0.0, google/cloud-sdk is the name and 193.0.0 is the tag. The tag latest is used by default if we don't provide a specific tag in the Docker commands. This fetches the latest version of the image. Keep in mind the risk of sticking to a specific tag; it might be deleted eventually. To fix that, we can keep a local copy of the image or pull an image using a specific SHA256 reference.

docker build -t google/cloud-sdk:193.0.0

7. Don't run out of memory

When application usage increases, memory usage also goes up. Take the case where a large amount of data processing is happening through API requests. As API hits go up rapidly, there will be a tremendous increase in RAM usage. The host then will start throwing Out Of Memory Exception and kill processes automatically to free up memory. When this happens, there could be a request fail. In order to avoid this problem, we should apply memory limits. 

Hard Limit

When you set a hard limit, under no circumstances or conditions will the container be allowed to use more than the specified RAM.

sudo docker run -d -p 8080:80 — memory=”256m” <>

Soft Limit

This uses some memory for reservation, which means that once the container hits the memory limit, it uses the reserved memory in order to prevent service outage.

sudo docker run -d -p 8080:80 — memory-reservation=”256m” <>

8. Secure the containers

There are many security vulnerabilities to guard against while building containers. Here are some quick tips for securing containers. 

Image scanning

Scanning the Docker local images allows the development teams to review the security state of the container images and take action. For this, use the docker scan <image-name> command.

Limit with non-root user

By default, Docker containers run with root privileges. This allows the user to have full control over the host system, making the application vulnerable to exploitation. To avoid this, we can create a dedicated user with the USER directive in the Dockerfile, which ensures the container’s application has the least privilege access.

FROM python:latest
COPY hello_world.py /data
USER alice
CMD [“python”,“hello_world.py”]

Sensitive data

While sharing the Docker image publicly, sensitive data like credentials, SSH keys, etc, may be leaked. Sensitive data needs to be kept hidden. So it's a best practice to move the data to .env file. Let us look at an example of calling the credentials inside the compose file.

# .env
username=Admin
password=Admin@123

Once the value is configured in the environment file, we can call the file by the attribute env_file and populate the values inside the compose file by calling the key ${env_key}

# docker-compose.yml
version: "3.7"
services:
    api:
        container_name: sample
        build:
            context: .
            dockerfile: Dockerfile
        env_file:
          - .env
        environment:
            DB_USERNAME: ${username}
            DB_PASSWORD: ${password}

If you have sensitive files in your folder, you can also use .dockerignore to ignore them. The files mentioned inside .dockerignore are ignored when the Docker build takes place.

# In .dockerignore
.env
.git
google_service_account.json

Docker content trust

Tampering of Docker images may happen between a Docker registry and a Docker user. It’s important to verify the authenticity before you pull images from their source. You can verify the signature by enabling Docker Content Trust. To do this, set the environment variable as DOCKER_CONTENT_TRUST=1.

No Image
Senior Engineer