Well-made Docker images are the foundation for deploying secure and scalable Docker-based applications. Building quality images also improves image re-usability, readability, and maintainability. Here are some best practices you should follow while building Docker images.
1. Cache image layers
Docker storage drivers use a COW (copy-on-write) filesystem to save disk space for images and future containers. Each command in Dockerfile creates a new layer. Every layer makes the filesystem changes between the states before and after the command is executed. Docker uses a technique of layer caching, which optimizes and speeds up the process of building the image.
This caching feature mainly works on RUN
, COPY
and ADD
commands. If the Dockerfile
and related files are unchanged, the existing layer in the local mirror cache can be used to rebuild the image. Let's quickly understand the workflow of layer caching.
FROM python:3.8-alpine COPY ./requirements.txt requirements.txt RUN pip install -r requirements.txt COPY . . ENTRYPOINT ["python", "app.py"]
Running the Dockerfile for the first time, you get the following output:
Sending build context to Docker daemon 4.096kB Step 1/5 : FROM python:3.8-alpine ---> 474c96543250 Step 2/5 : COPY ./requirements.txt requirements.txt ---> 28c77af68bd7 Step 3/5 : RUN pip install -r requirements.txt ---> Running in 2251632f122c Collecting Flask==2.0.2 Downloading Flask-2.0.2-py3-none-any.whl (95 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 95.2/95.2 KB 1.5 MB/s eta 0:00:00 Installing collected packages: Flask Successfully installed Flask-2.0.2 Removing intermediate container 2251632f122c ---> b4b25857a347 Step 4/5 : COPY . . ---> b3895300e604 Step 5/5 : ENTRYPOINT ["python", "app.py"] ---> Running in 2d4f19d0302d Removing intermediate container 2d4f19d0302d ---> 0bb229d4e965 Successfully built 0bb229d4e965 Successfully tagged sample:latest real 0m4.849s user 0m0.040s sys 0m0.036s
Did you notice that it took ~5 seconds to build the image for the first time? Now let’s make some additional changes in app.py
file and build it again.
Here is the result:
Sending build context to Docker daemon 4.096kB Step 1/5 : FROM python:3.8-alpine ---> 474c96543250 Step 2/5 : COPY ./requirements.txt requirements.txt ---> Using cache ---> 28c77af68bd7 Step 3/5 : RUN pip install -r requirements.txt ---> Using cache ---> b4b25857a347 Step 4/5 : COPY . . ---> 4d35850f7d71 Step 5/5 : ENTRYPOINT ["python", "app.py"] ---> Running in 859652485e62 Removing intermediate container 859652485e62 ---> 3f33c2ba2aa9 Successfully built 3f33c2ba2aa9 Successfully tagged sample:latest real 0m0.398s user 0m0.024s sys 0m0.040s
Notice how fast the build is? You can see Using cache
in steps 1-3, and steps 4 and 5 are built again due to changes in the app.py
file. You can use this to save a lot of build time and save even more if the layers between multiple images are common since the other images would simply use these cached layers.
2. Use multistage builds
The multistage build feature (introduced in Docker 17.05) is useful to anyone who struggles to optimize Dockerfiles or wants to build an efficient Docker image. It gives you more control over files and artifacts, which makes it possible to protect them from vulnerabilities.
The multistage build is basically multiple FROM
commands in the Dockerfile. Each command is a new build stage that can COPY
artifacts from the previous stages. Copying the build artifacts from the previous stage eliminates the intermediate steps that create an additional layer for each step, such as installing dependencies, downloading additional packages, and testing.
With multistage builds, our entire build system can be contained in a single file.
# Stage 1 FROM maven AS build WORKDIR /app COPY . . RUN mvn package # Stage 2 FROM tomcat COPY --from=build /app/target/file.war /usr/local/tomcat/webapps/
The above dockerfile has two FROM commands. These commands are numbered as stage 0 and stage 1 internally. Stage 0 is given a friendly alias as build. This stage builds the maven and stores it in the app directory as mentioned in the WORKDIR
command. The resulting image size is 635 MB.
The second stage pulls the official tomcat image from the Dockerhub. Then the COPY --from
command is used to copy only the app-related files from the previous stage build. As a result, the final image size is ~260 MB.
3. Pack a single app per container
A container can run different things simultaneously, but then we won’t be taking full advantage of the container model. For example, take the classic Apache/MySQL/PHP stack. You may want to run all the components in a single container. But the best practice is to use separate containers for each service. This makes it easier to scale up the service. Also, containers can be reused in multiple environments.
4. Use the smallest base image
When building a Docker image, you should always strive for smaller images, which offer advantages such as faster upload and download times. By using smaller images as the base, you can avoid downloading unnecessary packages. However, building a small image as a base is a challenge because you might unexpectedly include build dependencies or unoptimized layers. Also, keep in mind that images should be trustable.
A different variation of an operating system can be used as the base of an image. Compared to other OS images, the Alpine is much smaller in size. For example, the OS image of Ubuntu is 188 MB while Alpine is only 5 MB.
We can even use scratch
as the base image to build our own runtime environment. The scratch
image is actually an empty image, but you can’t use it in all cases. It helps if your application is statically compiled within the binary. Go for it only while building a minimal image.
FROM scratch COPY mybinary /mybinary CMD [ "/mybinary" ]
5. Pin software versions
Pinning versions only takes some extra seconds, but it will save you a lot of time in the future. This includes the base images, the code you pull from repositories, the libraries your code relies on, and so on. With versioning, you can have a consistent build of the application. Without it, the components would change such that a previously working Dockerfile does not build anymore. For example:
FROM jenkins/jenkins:2.235.4-lts-slim
6. Tag Docker image
Docker images are generally identified by two components: name and tag. For example, for the image google/cloud-sdk:193.0.0
, google/cloud-sdk
is the name and 193.0.0
is the tag. The tag latest
is used by default if we don't provide a specific tag in the Docker commands. This fetches the latest version of the image. Keep in mind the risk of sticking to a specific tag; it might be deleted eventually. To fix that, we can keep a local copy of the image or pull an image using a specific SHA256 reference.
docker build -t google/cloud-sdk:193.0.0
7. Don't run out of memory
When application usage increases, memory usage also goes up. Take the case where a large amount of data processing is happening through API requests. As API hits go up rapidly, there will be a tremendous increase in RAM usage. The host then will start throwing Out Of Memory Exception and kill processes automatically to free up memory. When this happens, there could be a request fail. In order to avoid this problem, we should apply memory limits.
Hard Limit
When you set a hard limit, under no circumstances or conditions will the container be allowed to use more than the specified RAM.
sudo docker run -d -p 8080:80 — memory=”256m” <>
Soft Limit
This uses some memory for reservation, which means that once the container hits the memory limit, it uses the reserved memory in order to prevent service outage.
sudo docker run -d -p 8080:80 — memory-reservation=”256m” <>
8. Secure the containers
There are many security vulnerabilities to guard against while building containers. Here are some quick tips for securing containers.
Image scanning
Scanning the Docker local images allows the development teams to review the security state of the container images and take action. For this, use the docker scan <image-name>
command.
Limit with non-root user
By default, Docker containers run with root privileges. This allows the user to have full control over the host system, making the application vulnerable to exploitation. To avoid this, we can create a dedicated user with the USER
directive in the Dockerfile, which ensures the container’s application has the least privilege access.
FROM python:latest COPY hello_world.py /data USER alice CMD [“python”,“hello_world.py”]
Sensitive data
While sharing the Docker image publicly, sensitive data like credentials, SSH keys, etc, may be leaked. Sensitive data needs to be kept hidden. So it's a best practice to move the data to .env
file. Let us look at an example of calling the credentials inside the compose file.
# .env username=Admin password=Admin@123
Once the value is configured in the environment file, we can call the file by the attribute env_file
and populate the values inside the compose file by calling the key ${env_key}
.
# docker-compose.yml version: "3.7" services: api: container_name: sample build: context: . dockerfile: Dockerfile env_file: - .env environment: DB_USERNAME: ${username} DB_PASSWORD: ${password}
If you have sensitive files in your folder, you can also use .dockerignore
to ignore them. The files mentioned inside .dockerignore
are ignored when the Docker build takes place.
# In .dockerignore .env .git google_service_account.json
Docker content trust
Tampering of Docker images may happen between a Docker registry and a Docker user. It’s important to verify the authenticity before you pull images from their source. You can verify the signature by enabling Docker Content Trust
. To do this, set the environment variable as DOCKER_CONTENT_TRUST=1
.