An Introduction to Kubernetes and Its Uses

It's easy to get lost in today's continuously changing landscape of cloud native technologies. The learning curve from a beginner's perspective is quite steep, and without proper context it becomes increasingly difficult to sift through all the buzzwords. If you have been developing software, chances are you may have heard of Kubernetes by now. Before we jump into what Kubernetes is, it's essential to familiarize ourselves with containerization and how it came about.

In this guide, we are going to paint a contextual picture of how deployments have evolved, what's the promise of containerization, where Kubernetes fits into the picture, and common misconceptions around it. We'll also learn the basic architecture of Kubernetes, core concepts, and some examples. Our goal from this guide is to lower the barrier to entry and equip you with a mind map to navigate this landscape more confidently.

Evolution of the deployment model

This evolution can be categorized into three rough categories, namely traditional, virtualized, and containerized deployments. Let's briefly touch upon each to better actualize this evolution.

undefined

Bare metal

Some of us are old enough to remember the archaic days when the most common way of deploying applications was on in-house physical servers. Cloud wasn't a thing yet, organizations had to plan server capacity to be able to budget for it. Ordering new servers was a slow and tedious task that took weeks of vendor management and negotiations. Once shiny new servers did arrive, they came with the overhead of setup, deployment, uptime, maintenance, security, and disaster recovery. 

From a deployment perspective, there was no good way to define resource boundaries in this model. Multiple applications, when deployed on the same server, would interfere with each other because of lack of isolation. That forced deployment of applications on dedicated servers, which ended up being an expensive endeavor making resource utilization the biggest inefficiency.

In those days, companies had to maintain their in-house server rooms and deal with prerequisites like air conditioning, uninterrupted power, and internet connectivity. Even with such high capital and operating expenses, they were limited in their ability to scale as demand increased. Adding additional capacity to handle increased load would involve installing new physical servers. "Write once and run everywhere" was a utopian dream, these were the days of "works on my machine". 

Virtual machines

Enter virtualization. This solution is a layer of abstraction on top of physical servers, such that it allows for running multiple virtual machines on any given server. It enforces a level of isolation and security, allowing better resource utilization, scalability, and reduced costs. 

It allows us to run multiple apps, each in a dedicated virtual machine offering complete isolation. If one goes down, it doesn't interfere with the other. Additionally, we can specify resource budgets for each. For example, allocate 40% of physical server resources to VM1 and 60% to VM2.

Okay, so this addresses isolation and resource utilization issues but what about scaling with increased load? Spinning a VM is way faster than adding a physical server.However scaling of VMs is still bound by available hardware capacity. 

This is where public cloud providers come into the picture. They streamline the logistics of buying, maintaining, running, and scaling servers against a rental fee. This means organizations don't have to plan for capacity beforehand. This brings down the capital expense of buying the server and operating expense of maintaining it significantly.

Containers

If we have already addressed the issue of isolation, resource utilization, and scaling with virtual machines, then why are we even talking about containers? Containers take it up a notch. You can think of them as mini virtual machines that, instead of packaging a full-fledged operating system, try to leverage the underlying host OS for most things. Container-based virtualization guarantees higher application density and maximum utilization of server resources.

An important distinction between virtual machines and containers is that VM virtualizes underlying hardware whereas the container virtualizes the underlying operating system. Both have their use cases, in fact, many container deployments use VM as their host operating system rather than running directly on bare metal.

The emergence of Docker engine accelerated the adoption of this technology. It has now become the defacto standard to build and share containerized apps - from desktop to the cloud. Shift towards microservices as a superior approach to application development is another important factor that has fueled the rise of containerization.

Demystifying container orchestration

While containers by themselves are extremely useful, they can become quite challenging to deploy, manage, and scale across multiple hosts in different environments. Container orchestration is another fancy word for streamlining this process.

undefined

As of today, there are several open-source and proprietary solutions to manage containers out there. 

Open-source landscape

If we look at the open-source landscape, some notable options include

Proprietary landscape

On the other hand, if we look at the propriety landscape, most of it is dominated by major public cloud providers. All of them came up with their home-grown solution to manage containers. Some of the notable mentions include:

Gold standard

Similar to how Docker became the de-facto for containerization, the industry has found Kubernetes to rule the container orchestration landscape. That's why most major cloud providers have started to offer managed Kubernetes service as well. We'll learn more about them later in the ecosystem section.

What exactly is Kubernetes?

Kubernetes is open-source software that has become the defacto standard for orchestrating containerized workloads in private, public, and hybrid cloud environments. 

It was initially developed by engineers at Google, who distilled years of experience in running production workloads at scale into Kubernetes. It was open-sourced in 2014 and has since been maintained by CNCF (Cloud Native Computing Foundation). It's often abbreviated as k8s which is a numeronym (starting with the letter "k" and ending with "s" with 8 other characters in between).

Managing containers at scale is commonly referred to as quite challenging, why is that? Running a single Docker container on your laptop may seem trivial (we'll see this in the example below) but doing that for a large number of containers across multiple hosts in an automated fashion ensuring zero downtime isn't as trivial. 

Let's take an example of a Netflix-like video-on-demand platform consisting of 100+ microservices resulting in 5000+ containers running atop 100+ VMs of varying sizes. Different teams are responsible for different microservices. They follow continuous integration and continuous delivery (CI/CD) driven workflow and push to production multiple times a day. The expectation from production workloads is to be always available, scale up and down automatically if demand changes, and recover from failures when encountered.

In situations like these, the utility of container orchestration tools really shine. Tools like Kubernetes allow you to abstract away the underlying cluster of virtual or physical machines into one unified blob of resources. Typically they expose an API, using which you can specify how many containers you'd like to deploy for a given app and how they should behave under increased load. API-first nature of these tools allows you to automate deployment processes inside your CI pipeline, giving teams the ability to iterate quickly. Being able to manage this kind of complexity in a streamlined manner is one of the major reasons why tools like Kubernetes have gained such popularity.

Kubernetes Architecture

undefined

To understand the Kubernetes' view of the world, we need to familiarize ourselves with cluster architecture first. Kubernetes cluster is a group of physical or virtual machines which is divided into two high-level components, control plane and worker nodes. It's okay if some of the terminologies mentioned below don't make much sense yet.

The key takeaway here is that the control plane is the brain responsible for accepting user instructions and figuring out the best way to execute them. Whereas worker nodes are machines responsible for obeying instructions from the control plane and running containerized workloads.

Kubernetes Objects

Now that we have some know-how of Kubernetes architecture, the next milestone in our journey is understanding the Kubernetes object model. Kubernetes has a few abstractions that make up the building blocks of any containerized workload.

undefined

We'll go over a few different types of objects available in Kubernetes that you are more likely to interact with:

Note: There are other objects like Replication Controller, Replica Set, Job, Cron Job, etc. that we have deliberately skipped for simplicity's sake.

Show me by example

Now that we have touched upon some of the most commonly used Kubernetes objects that act as the building blocks of containerized workloads, let's put it to work. For this example we'll do the following:

Setup

This guide assumes that you are on macOS, have Docker Desktop installed and running. It comes with a standalone Kubernetes instance which is a single-node cluster, an excellent choice these days to run Kubernetes locally. Additionally, you must also have kubectl installed. It is a command-line tool that allows us to run commands against Kubernetes clusters.

The easiest way to get up and running on Mac is to use Homebrew package manager like so:

> brew cask install docker

> brew install kubectl

Setup instructions and source code for this exercise can be found here.

Sample hello world app

Here we have a very simple hello world app written in NodeJS. It creates an HTTP server, listens on port 3000, and responds with "Hello World".

[Embed gist] https://gist.github.com/sarmadsaleem/3490c64a0c6d911be3e9f841b7e83881

Containerize sample app

To dockerize our app, we'll need to create a Dockerfile. It describes how to assemble the image. Here's what our Dockerfile looks like:

[Embed gist] https://gist.github.com/sarmadsaleem/3be62d461b0f2759da2cd6848e072331

Let's use Docker CLI to build and test the image using the Dockerfile above:

> docker build -t sarmadsaleem/scout-apm:node-app .

> docker run -p 3000:3000 -it --rm sarmadsaleem/scout-apm:node-app

> curl http://localhost:3000

Now that we have verified our container works fine locally, let's push this docker image to a public registry. For this example, we'll be using a public repository on Docker Hub to push the docker image.

> docker push sarmadsaleem/scout-apm:node-app

At this point, we have packaged our node app in a docker container and made it public in the form of an image. Anyone can pull it from our public repository and run it anywhere.

Define workload specification using Kubernetes objects

With our dockerized sample app ready, the only thing remaining to do is declare our desired state of workload using Kubernetes objects. We'll be dealing with Namespace, Deployment & Service in this example.

Typically Kubernetes manifests are declared in YAML files that describe the desired state. It is then passed to the Kubernetes API using kubectl.

[Embed gist] https://gist.github.com/sarmadsaleem/271b53c6a6507ecdc6874d4e0dfa2cbb

It's okay if some of the things in this manifest don't make sense yet. The key takeaway is that we have declared specification for our containerized workload using Kubernetes objects. A Namespace object is just a wrapper to group things together. Deployment object does the heavy lifting of creating pods (which hold containers), maintaining a specified number of replicas, and managing their lifecycle. Service object streamlines network access to all pods.

Now that we have the manifest ready, let's select the correct context for kubectl and apply the manifest:

> kubectl config use-context docker-desktop

> kubectl apply -f path/to/app.yaml

That was fast, what just happened? Kubernetes accepted our declared manifest and tried to execute it. In doing so it created a namespace, a bunch of pods, a replica set, a deployment, and a service. We should be able to verify all of them by doing so:

> kubectl get pod,replicaset,deployment,service -n scout-apm

Does this mean our same app is running? Yes! We should be able to reach our node app by visiting the following:

> kubectl get service -n scout-apm

> curl http://localhost:<service-port>

Let's clean up after ourselves:

> kubectl delete -f path/to/app.yaml

In a more production-ready setup, you'll probably have to deal with considerations like optimizing your docker images, automating deployments, setting up health checks, securing the cluster, managing incoming traffic over SSL, instrumenting your apps for observability, etc. The goal of this simple exercise was to jump from theory into a practical playground where we can see Docker in action, learn how to interact with Kubernetes, and deploy a sample workload.

Basic features

Let's touch upon some of the major Kubernetes features:

Kubernetes ecosystem

In the past few years, the Kubernetes ecosystem has grown exponentially. The popularity of Kubernetes has led to greater adoption thus inspiring innovation in different verticals like the following:

Common Questions 

Let's answer some common questions and clear up some misconceptions around Kubernetes in this section.

Is Kubernetes free?

The open-source version of Kubernetes itself is free to download, build, extend, and use for everyone, so there are no costs associated with the software itself. Typically organizations run their Kubernetes clusters in public, private, hybrid, or multi-cloud environments, in those cases, they have to pay for the underlying resources.

Who created Kubernetes?

Kubernetes originated at Google and distilled years of experience in running production workloads at scale. It was founded by Joe Beda, Brendan Burns, Craig McLuckie who were quickly joined by other Google engineers in their endeavor. It was later donated to Cloud Native Computing Foundation (CNCF) and is now being maintained by the foundation along with the open-source community under Apache License.

What is the difference between Docker and Kubernetes?

Docker is a container runtime meant to run on a single node whereas Kubernetes is a container orchestration tool meant to run across a cluster of nodes. They are not opposing technologies. They complement one another. We have covered this topic in detail in one of our earlier blog posts here.

How do you upgrade Kubernetes?

Upgrading Kubernetes version is a common practice to keep up with the latest security patches, new features, and bug fixes. This process is typically dictated by the tool you used to provision the cluster. If it's a managed control plane, the cloud provider exposes an API to trigger the upgrade. If it's a self-managed control plane, bootstrapping tools like kops and kubeadm simplify this workflow. 

Is Kubernetes the best way to run containers in production today?

This can be a controversial one. Kubernetes is certainly the most feature-complete container orchestration tool with a vibrant community and buzzing ecosystem. Is it the best way to run containers in production today? That depends on your use-case. Perhaps in some cases, you can get by using a PaaS solution like Heroku. Alternatively, you may be able to leverage new-age serverless container services like Google Cloud Run or AWS Fargate. In other cases where you want complete control and flexibility over your workloads, Kubernetes may be the front runner among container orchestration tools.

Observability in Cloud Native Applications

Observability can be considered as a superset of monitoring. Although it is a relatively new buzzword being thrown around these days, it isn't a new concept at all. It comes from engineering and control theory, at its core, it is the measure of how well internal states of a system can be inferred from knowledge of its external outputs.

In modern software development, observability cuts across the gathering, visualizing and analyzing metrics, events, logs, and traces to establish a contextual understanding of a system's operation. Part of establishing this context means having an insight into how well your application is performing. Enter Application Performance Monitoring (APM) tools - they can help monitor gaps in application performance allowing you to focus on solving customer experience issues and not just application health issues.

This is where ScoutAPM can help. Scout is an application performance monitoring product that helps developers drill down into the fine-grained details of app performance and stability issues. With Scout, you can get to the bottom of the exact cause of N+1 database queries, sluggish queries, memory bloat, and other performance abnormalities. 

Want to try Scout for yourself? Contact our team to schedule a demo now!