Running PostgreSQL in Kubernetes

Running PostgreSQL in Kubernetes

May 4, 2019

Running a postgreSQL in Kubernetes is totally possible – but is it something that you want to do? That depends…

Advantages and Disadvantages

There are both advantages and disadvantages to running a postgreSQL instance in kubernetes. Here’s a quick summary of the major ones.

Advantage: It’s Cheap

First off, the main advantage of running a postgres instance in Kubernetes is cost. Since you’re running the instance within an already existing block of compute resources – you’re not paying for anything new, only compute resources (EC2, Google Cloud Compute, your own hardware) that you’re already using.

Advantage: Internal Communication

The second is network communication and security. Since this database is internal to the cluster, services within the cluster can communicate with it, without the service ever needing to be exposed to the external environment or public web. While it’s always a good idea to encrypt internal cluster traffic, you can still be relatively safe making unsecured database connections from within the cluster – providing you’re not exposing your service with an Ingress resource.

Disadvantage: Backups

It’s always critical that you back up your database. Even if your code is completely fault tolerant, you never know what kind of freak accident will happen that will require you to restore from a backup. When running a postgreSQL instance in kubernetes however, backups can be a bit difficult. What most tend to do is create regular snapshots of the Persistent Volume Claim disk with their public cloud provider, like Google Cloud Platform (GCP) or Amazon Web Services (AWS). For those who are more used to a pg_dump command, life becomes a bit more difficult. This command can only be executed within the live container running the database – meaning you can’t spin up a second pod to handle the pg_dump, as your persistent volume claim can only be mounted by one instance at a time. You can automate this so that it does happen – but it’s a little hacky. I tend to stick to imaging the Persistent Volume Claim – which mounts the postgres data directory.

Disadvantage: Horizontal Scaling (or lack thereof)

One of the greatest benefits of container orchestration is being able to easily horizontally scale workloads. Unfortunately a standard out-of-the-box postgres instance isn’t going to have this capability. Only one instance can mount the root postgres data filesystem at any given time – meaning only one pod can run at any given time. This means it’s not possible to horizontally scale this deployment. The only possible option for expanding capability is to vertically scale the pod – giving it more memory and CPU resources.

How To Install Postgres on Kubernetes

Now that we’ve talked about the advantages and disadvantages – here’s how to install one on your kubernetes cluster. This tutorial was created using a GKE (Google Kubernetes Engine) cluster as the kubernetes cluster. This tutorial (and repository) creates the postgres instance, ensures that it is stateful so that if the pod gets killed, the database survives, and exposes it to the world. If you want to do backups, that is something that has to be handled externally by snapshotting the persistent volume.

Create A Secret

The first thing to do is create a secret with your root database password. This allows both your database (and any services that connect to it) to securely access the password without placing it in your source code. Don’t commit this file to source.

Run the kubectl apply -f postgres_kubernetes_secret.yaml command to deploy this to your cluster.

Create the Persistent Volume Claim

With the secret created, the next thing to do is create the persistent volume claim. This will provision a physical disk for you to mount your postgres data volume to. This ensures that if the pod is ever killed, the one that takes its place will mount this directory with all of your data, thus making this database more-or-less stateful.

Same as last time, use the kubectl apply -f command to apply this configuration to your cluster and provision the disk.

Create the Deployment

Next, create the deployment. This will create the actual postgres instance by pulling the latest postgreSQL docker image, mounting the Persistent Volume Claim, and getting the value of the secret we created above.

Apply it with the same kubectl apply -f command as before.

Create The Service (Expose it to the web)

Finally, with the database instance running, you can create the service (type LoadBalancer) which will expose the postgres deployment to the web. Want to keep it internal only? Take out the type: LoadBalancer line.

Apply it with the kubectl apply -f command, and that’s pretty much it. You should now have a publicly facing, stateful, postgreSQL instance running within your kubernetes cluster. It might take a minute for the public IP to be assigned, but you can get the IP by running the kubectl get svc command.

Hopefully that helps a few people get started with postgreSQL on kubernetes. Check out the full repository for more information. If you have any questions, please feel free to send me a message.