Extending applications on Kubernetes with multi-container pods


TL;DR: In this article you will learn how you can use the ambassador, adapter, sidecar and init containers to extend yours apps in Kubernetes without changing their code.

Kubernetes offers an immense amount of flexibility and the ability to run a wide variety of applications.

If your applications are cloud-native microservices or 12-factor apps, chances are that running them in Kubernetes will be relatively straightforward.

But what about running applications that weren’t explicitly designed to be run in a containerized environment?

Kubernetes can handle these as well, although it may be a bit more work to set up.

One of the most powerful tools that Kubernetes offers to help is the multi-container pod (although multi-container pods are also useful for cloud-native apps in a variety of cases, as you’ll see).

Why would you want to run multiple containers in a pod?

Multi-container pods allow you to change the behaviour of an application without changing its code.

This can be useful in all sorts of situations, but it’s convenient for applications that weren’t originally designed to be run in containers.

Securing an HTTP service

Elasticsearch was designed before containers became popular (although it’s pretty straightforward to run in Kubernetes nowadays) and can be seen as a stand-in for, say, a legacy Java application designed to run in a virtual machine.

Let’s use Elasticsearch as an example application that you’d like to enhance using multi-container pods.

The following is a very basic (not at all production-ready) Elasticsearch Deployment and Service:

The discovery.type environment variable is necessary to get it running with a single replica.

You can confirm that the pod works by running another pod in the cluster and curling to the elasticsearch service:

Now let’s say that you’re moving towards a zero-trust security model and you’d like to encrypt all traffic on the network.

How would you go about this if the application doesn’t have native TLS support?

Recent versions of Elasticsearch support TLS, but it was a paid extra feature for a long time.

Our first thought might be to do TLS termination with an nginx ingress, since the ingress is the component routing the external traffic in the cluster.

But that won’t meet the requirements, since traffic between the ingress pod and the Elasticsearch pod could go over the network unencrypted.

  • The external traffic is routed to the Ingress and then to Pods.

  • If you terminate TLS at the ingress, the rest of the traffic is unencrypted.

A solution that will meet the requirements is to tack an nginx proxy container onto the pod that will listen over TLS.

The traffic flows encrypted all the the way from the user to the Pod.

  • If you include a proxy container in the pod, you can terminate TLS in the Nginx pod.

  • When you compare the current setup, you can notice that the traffic is encrypted all the way until the Elasticsearch container.

So requests from outside the pod will go to Nginx on port 9200 over HTTPS and then forwarded to Elasticsearch on port 9201.

You can confirm it’s working by making an HTTPS request from within the cluster.

The -k version is necessary for self-signed TLS certificates. In a production environment, you’d want to use a trusted certificate.

A quick look at the logs shows that the request went through the Nginx proxy:

You can also check that you’re unable to connect to Elasticsearch over unencrypted connections:

You’ve enforced TLS without having to touch the Elasticsearch code or the container image!

Proxy containers are a common pattern

The practice of adding a proxy container to a pod is common enough that it has a name: the Ambassador Pattern.

All of the patterns in this post are described in detail in a excellent paper from Google.

Here are a few other things you can do with the Ambassador Pattern:

How do multi-container pods work?

Let’s take a step back and tease apart the difference between pods and containers on Kubernetes to get a better picture of what’s happening under the hood.

A “traditional” container (e.g. one started by docker run) provides several forms of isolation:

There are a few other things that Docker sets up, but those are the most significant.

The tools that are used under the hood are Linux namespaces and control groups (cgroups).

Control groups are a convenient way to limit resources such as CPU or memory that a particular process can use.

As an example, you could say that your process should use only 2GB of memory and one of your four CPU cores.

Namespaces, on the other hand, are in charge of isolating the process and limiting what it can see.

As an example, the process can only see the network packets that are directly related to it.

It won’t be able to see all of the network packets flowing through the network adapter.

Or you could isolate the filesystem and let the process believe that it has access to all of it.

  • Since kernel version 5.6, there are eight kinds of namespaces and the mount namespace is one of them.

  • With the mount namespace, you can let the process believe that it has access to all directories on the host when in fact it has not.

  • The mount namespace is designed to isolate resources — in this case, the filesystem.

  • Each process can see the same file system, while still being isolated from the others.

If you need a refresher on cgroups and namespaces, here’s an excellent blog post diving into some of the technical details.

On Kubernetes, a container provides all of those forms of isolation except network isolation.

In other words, each container in a pod will have its filesystem, process table, etc., but all of them will share the same network namespace.

Let’s play around with a straightforward multi-pod container to get a better idea of how it works.

You can see that the volume is mounted on the first container by using kubectl exec:

The command attached a terminal session to the container c1 in the podtest pod.

As you can see, a volume is mounted on /shared — it’s the shared volume we created earlier.

As you can see, the file created in the shared directory is available on both containers, but the file in /tmp isn’t.

This is because other than volume, the containers’ filesystems are entirely isolated from each other.

Now let’s take a look at networking and process isolation.

A good way of seeing how the network is set up is to use the command ip link, which shows the Linux system’s network devices.

Since MAC addresses are supposed to be globally unique, this is a clear indication that the pods share the same device.

The command starts a listener on localhost on port 5000 and prints the date command to any connected TCP client.

Now you can verify that the second container can connect to the network listener, but cannot see the nc process:

Connecting over telnet, you can see the output of date, which proves that the nc listener is working, but ps aux (which shows all processes on the container) doesn’t show nc at all.

This is because containers within a pod have process isolation but not network isolation.

The container that receives external traffic is the Ambassador, hence the name of the pattern.

One crucial thing to remember, though: because the network namespace is shared, multiple containers in a pod can’t listen on the same port!

Let’s have a look at some other use cases for multi-container pods.

Exposing metrics with a standard interface

Let’s say you’ve standardized on using Prometheus for monitoring all of the services in your Kubernetes cluster, but you’re using some applications that don’t natively export Prometheus metrics (for example, Elasticsearch).

Can you add Prometheus metrics to your pods without altering your application code?

For the Elasticsearch example, let’s add an “exporter” container to the pod that exposes various Elasticsearch metrics in the Prometheus format.

This will be easy, because there’s an open-source exporter for Elasticsearch (you’ll also need to add the relevant port to the Service):

Once this has been applied, you can find the metrics exposed on port 9114:

Once again, you’ve been able to alter your application’s behaviour without actually changing your code or your container images.

You’ve exposed standardized Prometheus metrics that can be consumed by cluster-wide tools (like the Prometheus Operator), and have thus achieved a good separation of concerns between the application and the underlying infrastructure.

Tailing logs

Next, let’s take a look at the Sidecar Pattern, where you add a container to a pod that enhances an application in some way.

The Sidecar Pattern is pretty general and can apply to all sorts of different use cases (and you’ll often hear any containers in a pod past the first referred to as “sidecars”).

Let’s first explore one of the classic sidecar use cases: a log tailing sidecar.

In a containerized environment, the best practice is to always log to standard out so that logs can be collected and aggregated in a centralized manner.

But many older applications were designed to log to files, and changing that can sometimes be non-trivial.

Adding a log tailing sidecar means you might not have to!

Let’s return to Elasticsearch as an example, which is a bit contrived since the Elasticsearch container logs to standard out by default (and it’s non-trivial to get it to log to a file).

The logging configuration file is a separate ConfigMap that’s too long to include here.

The Elasticsearch container writes logs to that volume, while the logs container just reads from the appropriate file and outputs it to standard out.

You can retrieve the log stream by specifying the appropriate container with kubectl logs:

The great thing about using a sidecar is that streaming to standard out isn’t the only option.

If you needed to switch to a customized log aggregation service, you could just change the sidecar container without altering anything else about your application.

Other examples of sidecars

There are many use cases for sidecars; a logging container is only one (straightforward) example.

Here are some other use cases you might encounter in the wild:

Preparing for a pod to run

All of the examples of multi-container pods this post has gone over so far involve several containers running simultaneously.

Kubernetes also provides the ability to run , which are containers that run to completion before the “normal” containers start.

This allows you to run an initialization script before your pod starts in earnest.

Why would you want your preparation to run in a separate container, instead of (for instance) adding some initialization to your container’s entrypoint script?

The Elasticsearch docs recommending setting the vm.max_map_count sysctl setting in production-ready deployments.

This is problematic in containerized environments since there’s no container-level isolation for sysctls and any changes have to happen on the node level.

How can you handle this in cases where you can’t customize the Kubernetes nodes?

One way would be to run Elasticsearch in a privileged container, which would give Elasticsearch the ability to change system settings on its host node, and alter the entrypoint script to add the sysctls.

If the Elasticsearch service were ever compromised, an attacker would have root access to its host node.

You can use an init container to mitigate this risk somewhat:

The pod sets the sysctl in a privileged init container, after which the Elasticsearch container starts as expected.

You’re still using a privileged container, which isn’t ideal, but at least it’s extremely minimal and short-lived, so the attack surface is much lower.

Using a privileged init container to prepare a node for running a pod is a fairly common pattern.

For instance, Istio uses init containers to set up iptables rules every time a pod runs.

Another reason to use an init container is to prepare the pod’s filesystem in some way.

Another init container use case

If you’re using something like HashicCorp Vault for secrets management instead of Kubernetes secrets, you can retrieve secrets in an init container and persist them to a shared emptyDir volume.

Now the secret/my-secret secret will be available on the filesystem for the myapp container.

This is the basic idea of how systems like the Vault Agent Sidecar Injector work.
However, they’re quite a bit more sophisticated in practice (combining mutating webhooks, init containers, and sidecars to hide most of the complexity).

Even more init container use cases

Here are some other reasons you might want to use an init container:

Summary

This post covered quite a lot of ground, so here’s a table of some multi-container patterns and when you might want to use them:

Be sure to read the official documentation and the
original container design pattern paper if you want to dig deeper into this subject.

Be the first to be notified when a new article or Kubernetes experiment is published.

*We’ll never share your email address, and you can opt-out at any time.