Container Orchestration Tools Explained
The way we write, ship, and maintain software today has evolved drastically in the last few years. How we consume underlying infrastructure to run our software has matured significantly, in that we have seen a transition from bare metal to virtual machines to containers to micro-VMs.
The rise in the adoption of microservices has certainly paved the way for containers to be the primary approach for organizations to package and ship their applications. Amid this evolution, we have seen Docker become almost synonymous with containers and Kubernetes emerging as the gold standard of orchestrating those containers. Some of the primary benefits of this transition include fault isolation, resource utilization, and scaling of workloads, all of which have a direct impact on the business.
In this post, we'll get into the what and why of container orchestration. We'll also take a look at some of the leading tools out there and stack them up against each other with the aim to help you choose the right tool for the job.
We’ll be covering the following topics -
- What are containers again? Why do we even need them?
- What Exactly is Container Orchestration?
- Popular Container Orchestration Tools
- Which one is right for me?
- Containerized Application Performance
What are containers again? Why do we even need them?
Back in the day, we ran applications on bare-metal, which is another way of saying physical, on-premise servers. It was time-intensive, expensive and an error-prone endeavor which was extremely slow to scale. In came virtual machines to solve these pain points - a layer of abstraction on top of physical servers allowing us to run multiple operating systems in complete isolation on the same physical servers, enforcing better security, resource utilization, scalability, and a significant reduction in costs.
That’s great! But if we have already addressed the aforementioned pain points with virtual machines, then why are we even talking about containers? Well, containers take it up a notch. You can think of them as mini virtual machines that, instead of packaging a full-fledged operating system, try to leverage the underlying host OS for most operations. Container-based virtualization guarantees higher application density and maximum utilization of server resources.
An important distinction between virtual machines and containers is that VM virtualizes underlying hardware whereas the container virtualizes the underlying operating system. Both have their own use cases. Interestingly, many container deployments use VM as their host operating system rather than running directly on bare metal.
Whether you are building a monolith or a microservices-based architecture, if resilience and the ability to scale fast are important to you, for most types of workloads, containerization is your best bet when it comes to packaging your application.
What Exactly is Container Orchestration?
At its core, container orchestration is about managing the lifecycle of containers. Whether you are running a monolith or a bunch of microservices, container orchestration tools can help you streamline the container lifecycle management in both scenarios. However, its real utility really shines through at scale in complex dynamic environments. Tools in this space help teams to control and automate many tasks including:
- Recovering from encountered failures, ensuring that your apps are self-healing, robust, and resilient.
- Provision and scheduling of containers by allocating required resources based on predefined configurations.
- Scaling services by adding or removing containers, typically based on some metrics.
- Monitoring health of containers and hosts.
- Exposing services to the outside world.
- Load balancing traffic between multiple containers seamlessly.
While containers by themselves are extremely useful, they can become quite challenging to deploy, manage, and scale across multiple hosts in different environments. Container orchestration is another fancy word for streamlining this process.
Most container orchestration tools follow similar mechanisms from a consumer point of view. They allow you to configure your application through configuration files (typically YAML or JSON) that tell the orchestration tool things like - where to get container images, how to do networking, how to handle storage volumes, and where to push logs. In most cases, software teams prefer to version control these configuration files based on their environment (development, staging, or production) to make things auditable and reproducible.
These configuration files are handed off to the tool using an interface (typically a CLI). The tool then schedules the deployment and selects the best host to place the containers, based on the constraints defined in the configuration. Once containers are up and running, the tool continuously monitors the app by matching the desired state with the actual state, in addition to querying health checks. If anything doesn't add up and/or results in a failure, it tries to recover from that failure automatically. Being able to run these orchestration tools in disparate environments ranging from a desktop to bare metal servers to cloud-based VMs is a big selling point.
Popular Container Orchestration Tools
When Docker emerged in 2013, containers exploded in popularity. A number of tools have since been developed to make container management easier. While they have been around for years, many consider 2017 to be the year that container tools came of age. As of today, there are several open-source and proprietary solutions to manage containers out there.
In the open-source space, Kubernetes, Docker Swarm, Apache Marathon on Mesos, and Hashicorp Nomad are some of the notable players. While the proprietary space is dominated by leading cloud providers, some of the notable examples include Amazon Web Services (AWS) Elastic Container Service, Google Cloud Platform (GCP) Compute Engine & Cloud Run, Microsoft Azure Container Instances & Web Apps for Containers.
Let's zoom into some of the most popular ones, stack them up against each other and try to better understand how they differ from each other.
Kubernetes - The gold standard
Similar to how Docker became the de-facto for containerization, the industry has found Kubernetes to rule the container orchestration landscape. That's why most major cloud providers have started to offer managed Kubernetes services as well.
It's open-source software that has become the gold standard for orchestrating containerized workloads in private, public, and hybrid cloud environments. Initially developed by engineers at Google, who distilled years of experience in running production workloads at scale into Kubernetes. It was open-sourced in 2014 and has since been maintained by CNCF (Cloud Native Computing Foundation). It's often abbreviated as k8s which is a numeronym (starting with the letter "k" and ending with "s" with 8 other characters in between).
Managing containers at scale is usually quite challenging. Why is that? Because running a single Docker container on your laptop may seem trivial but doing that for a large number of containers across multiple hosts in an automated fashion ensuring zero downtime isn't as trivial.
Let's take an example of a Netflix-like video-on-demand platform consisting of 100+ microservices resulting in 5000+ containers running atop 100+ VMs of varying sizes. Different teams are responsible for different microservices. They follow a continuous integration and continuous delivery (CI/CD) driven workflow and push to production multiple times a day. The expectation from production workloads is to be always available, scale up and down automatically if demand changes, and recover from encountered failures.
In situations like these, the utility of container orchestration tools really shines. Tools like Kubernetes allow you to abstract away the underlying cluster of virtual or physical machines into one unified blob of resources. Typically they expose an API, using which you can specify how many containers you'd like to deploy for a given app and how they should behave under increased load. The API-first nature of these tools allows you to automate the deployment processes inside your CI pipeline, giving teams the ability to iterate quickly. Being able to manage this kind of complexity in a streamlined manner is one of the major reasons why tools like Kubernetes have gained such popularity.
Underlying architecture & objects
To understand Kubernetes' view of the world, we need to familiarize ourselves with cluster architecture first. Kubernetes cluster is a group of physical or virtual machines which is divided into two high-level components - the control plane and worker nodes.
- Control plane - It acts as the brain for the entire cluster, responsible for accepting user instructions, health checking all servers, deciding how to best schedule workloads, and orchestrating communication between components. Constituents include components like kube-apiserver, etcd, kube-scheduler, kube-controller-manager, and cloud-controller-manager.
- Worker nodes - These are machines responsible for accepting instructions from the control plane and running containerized workloads. Each machine runs a kubelet, kube-proxy, and container runtime.
Now that we have some know-how of the Kubernetes architecture, the next milestone in our journey is understanding the Kubernetes object model. Kubernetes has a few abstractions that make up the building blocks of any containerized workload.
We'll go over a few different types of objects available in Kubernetes that you are more likely to interact with:
- Pod - It is the smallest deployable unit of computing in the Kubernetes hierarchy. It can contain one or more tightly coupled containers sharing environment, volumes, and IP space. Generally, it is discouraged for users to manage pods directly. Instead, Kubernetes offers higher-level objects (deployment, statefulset & daemonset) to encapsulate that management.
- Deployment - High-level object designed to ease the life cycle management of replicated pods. Users describe a desired state in the deployment object and the deployment controller changes the actual state to match the desired state. Generally, this is the object users interact with the most. It is best suited for stateless applications.
- Stateful Set - You can think of it as a specialized deployment best suited for stateful applications like a relational database. They offer ordering and uniqueness guarantees.
- Daemon Set - You can think of it as a specialized deployment when you want your pods to be on every node (or a subset of it). It is best suited for cluster support services like log aggregation, security, etc.
- Secret & Config Map - These objects allow users to store sensitive information and configuration respectively. They can then be exposed to certain apps thus allowing for more streamlined configuration and secrets management.
- Service - This object groups a set of pods together and makes them accessible through DNS within the cluster. Different types of services include NodePort, ClusterIP, and LoadBalancer.
- Ingress - The Ingress object allows for external access to the service in a cluster using an IP address or some URL. Additionally, it can provide SSL termination and load balancing as well
- Namespace - This object is used to logically group resources inside a cluster
Note: There are other objects like Replication Controller, Replica Set, Job, Cron Job, etc. that we have deliberately skipped for simplicity's sake.
You can find our dedicated blog post on Kubernetes that gets into examples, features, ecosystem, and commonly asked questions here.
Docker Swarm - Lightweight alternative potentially approaching the end of life
Let's differentiate between Docker and Docker Swarm first. Docker is a container runtime comparable with rkt. Docker Swarm, on the other hand, is a cluster management and orchestration tool embedded in the Docker Engine, comparable with Kubernetes and likes. As compared to Kubernetes, it's a slightly less extensible and complex tool that is best suited for people who want an easier path to deploying containers. On a higher level, you'll notice a lot of similarities when it comes to the architecture of both the tools.
An important thing to note here is that after Mirantis acquired Docker Enterprise, in late 2019, they announced that the primary orchestrator going forward would be Kubernetes. They'll support Swarm for at least two years and will work on making the transition easier to Kubernetes.
Does this mean Docker Swarm is dead and we shouldn't even talk about it? Not really! As of now, all this means is we won't be seeing many Docker Swarm-as-a-service options out there. However, for simpler use cases, it still is a viable option owing to its lightweight and simple nature.
Underlying architecture
To understand Docker Swarm's view of the world, we need to familiarize ourselves with the cluster architecture first. Swarm by itself is a group of physical or virtual machines that are divided into two high-level components, manager node, and worker nodes.
- Manager node - Similar to Kubernetes Control Plane, it is responsible for receiving service definition from the user and dispatching instructions to worker nodes on how to run that service. Additionally, it also performs the orchestration and management functions necessary to sync the actual state with the desired state of the cluster. Manager nodes elect a single leader to conduct orchestration tasks.
- Worker node - Similar to Kubernetes worker nodes, it receives and executes tasks dispatched from manager nodes. An agent runs on each worker node and reports back to the manager node on the assigned tasks so that manager can maintain the desired state of each worker.
Now that we have some idea of its architecture, let's get into object-level constructs of Docker Swarm.
- Task - A task carries a Docker container and the commands to run inside the container, it is the atomic unit of scheduling within a swarm. When we declare the desired state of a service, the orchestrator realizes the desired state by scheduling tasks. If the task fails, the orchestrator removes the task and its container and then creates a new task to replace it according to the desired state specified by the service.
- Service - Service is the definition of tasks to be executed on the nodes. When creating a service, the user specifies which image to the user and which commands to execute inside running containers. There are two types of services - replicated and global. Similar to Kubernetes deployments, in the replicated services model, the manager spins up a specified number of replica tasks among nodes. Similar to Kubernetes daemon set, for global services, swarm runs one task for the service on every available node.
- Load balancer - Swarm manager uses ingress load balancing to expose services to the outside world. External components like cloud load balancers can access a given service on its port while swarm uses internal load balancing to distribute requests among services within the cluster.
Proprietary offerings by public cloud providers - Let us handle the management overhead
Just like in the open-source space, cloud orchestration tools have a pretty competitive propriety space mostly dominated by public cloud providers like Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.
These cloud providers also offer a managed version of Kubernetes. What that means is that the provider is responsible for managing and maintaining the cluster. This reduces maintenance and management overhead. For the purposes of this section, we'll ignore this and focus on proprietary offerings only.
- AWS Elastic Container Service - ECS is a fully managed container orchestration service from AWS which has deep integrations with other AWS services like Route53, Secret Manager, IAM, CloudWatch, etc. It offers two ways to run workloads - one is on EC2 virtual machines and the second, more recent one is Fargate which brings serverless capabilities to ECS. Even though it's AWS's home-grown take on how to manage containers at scale, it has a lot of similarities with Kubernetes and Docker Swarm. This can be seen in how users need to define the configuration of their applications as task definitions and then provide these definitions as JSON documents to AWS Console or CLI interface. Depending on the configuration, and which mode you selected (EC2 or Fargate), ECS schedules the tasks composing services appropriately onto the cluster and monitors them constantly to maintain the desired state.
- GCP Cloud Run - Cloud Run is a fully managed serverless container orchestration platform. It abstracts away all infrastructure management by adopting the serverless model for containers. What that means is that your application can scale down to zero and you don't pay anything when there's no traffic and on the other hand it can scale up to millions of requests almost instantaneously. This in essence is very close to AWS Fargate’s offering. As a consumer of Cloud Run, all you need to do is provide the platform for your Docker containers and it'll take care of the rest. This is quite convenient as all the complexity has been automated or abstracted away. Under the hood, it runs Knative, which is a Kubernetes-based platform to deploy and manage modern serverless workloads. This goes to show the real superpower of Kubernetes - its extensibility, in a way that when leveraged fully, can act as a building block of more developer-friendly platforms.
- Azure Container Instance - Container Instances is Microsoft Azure's answer to running containers on-demand in a serverless fashion. It is comparable to AWS Fargate and GCP Cloud Run. Since this serverless model allows you to not worry about the underlying infrastructure and just focus on application logic, a lot of management overhead is cut down. From a consumer's perspective, all you need is a Docker container and specify the necessary configuration and the platform handles the rest for you, including things like - provision resources, scaling containers up and down, necessary networking, and health monitoring, to name a few.
Which one is right for me?
There's no one size fits all when it comes to container orchestration tools. Choosing the right tool for the job is very use-case dependent.
- Pay per use, minimum management overhead - If you want to run your containerized apps with straightforward needs and don't want to deal with any management overhead, perhaps your best bet is to use one of the serverless containers offerings out there like AWS Fargate, Google Cloud Run or Azure Container Instances.
- Fine-grain control, flexibility with little management - If your needs require fine-grain control, customization, and flexibility, perhaps a managed version of Kubernetes in form of AWS EKS, GCP GKE, or Azure AKS are better suited for your use-case as they reduce the overhead of provisioning and running a Kubernetes cluster while ensuring smooth inter-cloud integrations.
- Fine-grain control, flexibility within given constraints - If your use case has strict data residency and sovereignty constraints and you are required to run containerized workloads in an on-premises or private cloud setting, perhaps self-managed Kubernetes is the forerunner among all container orchestration tools.
Containerized Application Performance
Monitoring, in general, is evolving into a superset of observability these days. If we try to see through the hype, at its core, it is the measure of how well internal states of a system can be inferred from knowledge of its external outputs. It cuts across the gathering, visualizing, and analyzing of metrics, events, logs, and traces to establish a contextual understanding of a system's operation.
In a world of distributed systems dominated by microservices, monitoring your systems and applications has become a complex endeavor. Understanding how different components of your system interact with each other is the key to understanding the bigger picture. Part of that means having an insight into how well your application is performing. That's where Application Performance Monitoring (APM) tools like ScoutAPM can help in bringing to light the gaps in application performance so that you can get to the bottom of the exact cause of N+1 database queries, sluggish queries, memory bloat, and other performance abnormalities.
Want to try Scout for yourself? Contact our team to schedule a demo now!