For a long time, I’ve been concerned about deploying my containerized applications without downtime, ideally using Docker. While tools like Docker Swarm and various plugins offer solutions for this, I decided to explore something new (at least for me): Kubernetes. This post documents my journey into learning and using Kubernetes.
Goal
First I should make clear that the goal of this proof of concept is to run a simple NodeJS API application, and not to deploy a complete production environment using Kubernetes, as this might contain database, cache, queue and pub/sub services.
Prepare for duplication
The first thing I need to do is prepare the application to not fail if it has multiple versions of itself running. Even if you don’t have two instances running at the same time, there may be more than one when we do the zero-downtime deploy, since it needs to create a new instance, wait some time, and then redirect traffic to it. Ideally, check if the following items will work well with multiple versions of the application running:
- Database connections
- Seed and migrations
- Scheduled jobs (crons)
- Queues and Pub/Sub services
- Remote logging
Container registry
Other than having the application parts running in each container, we must also publish this container to some container registry service. To do this, any vendor would work: Docker Hub, AWS, GitHub, etc.
Up to this point, you should be able to manually run your application using Docker.
Local environment for testing
If you want to, you can use a cloud provider managed Kubernetes service, and it will cost some dollars for running. For testing, its great to run a local version of the engine, and this can be achieved through Minikube.
Control Plane
A control plane is a service that is responsible for managing all of the nodes in the cluster that actually runs your applications. Since we’re using a demo version, that control plane will be the only host itself, but in production, that is generally a separate machine(s) or a cloud provider endpoint.
Just out of curiosity, I’ve wondered how much resource we should allocate for the Control Plane in a production system. Will a 0.5vCPU 512MB RAM work? I couldn’t find an official recommendation in Kubernetes documentation, but I found a blog post recommending 2+ vCPU and 4+GB of RAM.
Nodes
The nodes are all of the other instances that are not the Control Plane, and that actually runs your application. Each node is a machine running the Kubelet service, which connects to the Control Plane to receive orders. Each node must also have a container service, such as containerd or Docker.
Kubernetes Deployment
Now that we have a local Kubernetes cluster running, we can finally start playing with it. The first term presented in the Kubernetes infrastructure is the Deployment. Deployments are top level configurations for deploying your applications.
Just like Docker and docker-compose, we can run kubectl commands directly to create each resource or use an .yml file to describe the infrastructure. For a deeper understanding, you can play with the commands and try to create an app just using the commands. For production applications, it’s better to use an .yml file.
Pods
A Pod is one ore more containers that have shared storage and networking. For example, a Pod can contain an API, and a separate polling app that fetches data and sends it to the API. The containers in a Pod are always in the same Node, and share the same IP and port space.
Services
Pods are ephemeral. We can have multiple Pods for one application, and they can be deleted or created new ones. For most applications, we need a way to connect to a specific service of our application (for example, a web API). To provide a logical grouping for accessing a set of pods, Kubernetes has Services, which classified Pods and provides rules for accessing them. If using multiple Pods, Services provide a network load balancer to distribute traffic across the healthy pods.
Using an .yml file to specify the architecture
To configure each resource in Kubernetes (such as Deployment, Pod, Service, Secret), you can use one ore more .yml files. Each one can have one, or multiple resources. When deploying, we can specify a path for a file, or folder, and all of the .yml files contained will be applied to the cluster.
Rolling updates
Having services, pods and deployments configured, to publish an update we can make a rolling update, which replace old version pods with newer versions. This ensures that traffic never is routed to containers that are not running. We can configure the maximum number of containers that can be offline during that rolling update.
It’s importante to note that to have rolling updates, the minimum number of replicas for a pod is 2. Having two, one can be down and replaced, and this is impossible if using just one replica.
Standalone deployment and K3S
After configuring everything, I tried running my application and accessing it using the port 80 to discover that my Service of type LoadBalancer was in the state “Pending” forever. Kubernetes doesn’t come with a Layer 4 load balancer, and the idea is that you can use any load balancer provider you want to, or use one that is integrated with the service provider you’re using (such as AWS, GCP, Azure or Azure). It makes sense to not include a load balancer since, in general, people will use Kubernetes in a big cloud provider with an integrated load balancer that has an interface to Kubernetes.
Since I wanted to have my service exposed using a LoadBalancer service, I searched for solutions to make it work and found K3S, a “Lightweight Kubernetes” with a built-in L4 load balancer. Using it, I can just create a LoadBalancer and expose it on port 80, right? No. In fact, K3S comes even with a L7 (HTTP) load balancer called Traefik Proxy that is running by default on port 80. That also makes sense because generally you’ll have multiple web applications running in the same cluster. I changed my LoadBalancer exposed port to an unused one (such as 8081) and configured Traefik Proxy, and then it finally worked.
Both the load balancer and proxy (which is also a load balancer) could be configured using the .yaml files.
Application warmup and shutdown delay
If you’re used to an Express app in NodeJS, you know it can take some time to shutdown (automatic closing database connections and waiting for pending HTTP requests) and also takes some time to receive requests (it has to start the database connections, instantiate the HTTP routes, etc.). Just by using Docker, if you do the deploy, you’ll get some downtime while deploying, and to make sure it doesn’t happen with the Kubernetes cluster, we must make sure to wait some time before redirecting traffic to the new version of the system.
For bigger applications, it also takes time to open more database connections, populate the in-memory cache, etc.
Managing a Kubernetes cluster in the real world
- We use the Kubernetes API (or CLI) for talking to an external control plane running in a provider
- We generally do the deployments in a CI/CD pipeline