When it comes to your images, bigger is never better. With larger images comes longer deploy times. Larger images are naturally going to take longer to build and ultimately deploy over the network. In a data center with fast connections, pull time may not be noticed all that much, but in your CI/CD pipeline this slowness will be down right painful. From an ops perspective, there is no more annoying complaint than "deploys for my service are slow." While there are many reason docker images increase in size, we are going to focus on multiple stage docker builds.
Multiple stage docker deployments can slim down your image in a hurry, especailly if you are using a programing language that builds a single binary like golang. Once the initial binary is built, a lot of the accompaning files and OS packages aren't needed as much. Keep in mind that there are some things you may have to copy to your final image. In my experience, things like templates that are rendered on the fly and packages that pretain to things like certificate authorities will need to find their way in to the final image. Things like all of your .go files and your ReadMe don't have much value in your deployment artifact.
For now we are going to focus on a small golang api that runs on the gin-gonic framework. All this service will do for us is respoing "pong" to our "/ping" reguests. Super useful right?! For the purpose of this example we will refer to the following code snippit as our application.
$ tree dkr-test
dkr-test
├── Dockerfile
├── cmd
│ └── main.go
├── go.mod
└── go.sum
1 directory, 4 files
cat cmd/main.go
package main
import (
"github.com/gin-gonic/gin"
)
func main() {
router := gin.Default()
router.GET("/ping", func(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{"message": "pong"})
})
router.Run(":8090")
}
There shouldn't be any suprieses in the directory listed above. While it is a very stripped down app, it allows us to focus on the fundamentals of a multiple stage docker build. First lets take a look at the familiar bit. We'll be using the alpine distro for this excercise because it is already very much geared toward slim deployments. While it may not be right for your use, the patter should match well to any other distro. The top section of our Dockerfile will look very familiar. In this section there should only be two suprieses, the first stage doesn't reference command or entrypoint and the 'as factory' section on the FROM line.
FROM golang:1.20-alpine as factory
COPY . /app
WORKDIR /app
RUN apk update && apk upgrade
RUN apk add git g++
RUN go mod vendor
RUN cd /app/cmd && go build -tags musl -a -race -o /app/dkr-test
Nice, only a few small paragraphs in and we are into what we're all here for. Let's take a look at the FROM line and the missing CMD and/or ENTRYPOINT. The only new addition in the FROM line comes in the for of 'as factory'. You can swap out the factory for almost any word that makes sense to you, but this is telling Docker "Hey, the things I'm going to buld in this stage, I wan't to refer back to them as the 'factory' stage". Later on we will see how we call back to the factory stage to pull out the neccisary parts. We also omit the CMD or ENTRYPOINT config here, because we aren't done building yet we have more config to write. At this point, once docker is finished with this build stage, we will essentially have a single stage Docker build with everything inside the current directory copied into the containers "/app" directory. This teamed with the os packages and go vendor files added in the RUN sections is pretty much unused by our applicaition, so why keep it around and cause it to slow down our CD pipeline.
The second stage of the build allows us to jettison the valuable parts that contributed toward our binary, but are no longer needed for the container to run properly in production. We will use the following config to get rid of the main thruster so we can make it to deployment.
### Final image
FROM alpine:latest
WORKDIR /app
COPY --from=factory /app/dkr-test /app/
EXPOSE 8080
CMD ["/app/dkr-test"]
This section should also look very famiiar to the standard Dockerfile with the exception of the COPY line, so lets drill in there. "--from=factory" will reach back to the previous stage we defined as "factory". From there, in this case, we only care about the binary we built and output to the '/app/dkr-test' directory of the factory stage. This config will only grab the binary, so if you need extra files you would have to copy them in a similar fashion. At this point your final image should only incldue the binary all other files will be left behind in the factory stage.
Drum roll please... lets take a look at the size difference between the images built as a single stage and the one built as a multiple stage. From the following, you can see that our multi stage build saved us a ton of space. Your milage may vary, but this patter has been around for while (since about 2017) and is a great way to trim of some excess deploytime.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
dkr-test-single latest e8acc6a5a691 About an hour ago 813MB
dkr-test-multi latest d9e898fad32e 2 hours ago 20.9MB
For debugging purposes, docekr has added some helpful commands to their buildKit. You don't have to wait for all the stages to complete before testing or need to do some magic editing to your Dockerfile. The following command will allow you to stop at the stage you defined as builder. On top of this there are some other pretty interesting commands to aid in multi stage Docker builds. Checkout Docker Docs
$ docker build --target builder -t alexellis2/href-counter:latest .
Argocd app of apps definition.
Argo Workflows up and running
Automating workflows with github actions.
Simple Api using gin-gonic.
Building K8s manifests with Helm3
A collection of kubectl commands that have helped me a ton.
Configuring Role Based Access Control in Kubernetes.
Making images smaller with multi stage builds.
Keeping secrets out of your container images even while using private modules