October 15, 2020

Kubernetes sucks

Kubernetes is the “hot new thing” that people like to claim is the “future of deployment” or the “future of infrastructure” or any other future-y fancy sounding statement (see ex. this and this). And you know, I get it, I really do. Kube takes care of so many things – rolling upgrades, management of stateful deployments, automatically spreading load across a cluster of servers and rebalancing, … – and so it makes sense that people are attracted to it.

But at least in my experiences with it, I consistently run into inexplicable issues that just completely ruins the utility of it. And I get it, all tools – especially deployment tools – have pains of some sort when it comes to using them. But the issues I’ve experienced with Kube are just… beyond ridiculous. Volumes disappearing, DNS failing in inexplicable ways, the cluster straight-up breaking to the point of needing a full reinstallation, and more. This has been on SO MANY versions of Kube too, from around 1.5 to 1.19, hosted both as managed by different cloud providers – most recently from DigitalOcean – and unmanaged via self-installation with kubeadm; it’s not like I’m running some bespoke pre-1.0 version or anything.

Volumes disappearing #

Volumes being mysteriously shredded by Kube is something I’ve actively experienced. It’s how I learned the hard way to never host a database in a Kubernetes cluster, despite there being many seemingly-supported ways of doing so, such as Helm charts for many common databases, such as Cassandra, ElasticSearch, etcd, InfluxDB, MySQL (MariaDB), and PostgreSQL. There’s even a specific deployment set that’s (arguably) made specifically for databases – the StatefulSet.

And yet, every time I’ve attempted it, at some point sooner than later, I always deal with my volumes mysteriously disappearing. The type of volume hasn’t mattered, whether it’s a cloud-provisioned PVC, or a slice of a disk on a dedicated server I’ve rented. I wouldn’t trust it anymore, even with solid backups. I’m not really sure why this happens, nor do I have any log files left – or even necessarily accessible – to figure out what happened. This is compounded by a managed k8s service where I don’t even necessarily have access to the master nodes.

Inexplicable DNS failures #

Every time I’ve used Kubernetes, I have inexplicable DNS failures, every time. Some pods will stop being able to resolve internal DNS names, some might lose their ability to do external DNS lookups, it might affect all pods or only a subset of them, … It’s gotten to the point where the only sources found by Google for what might be wrong is just direct lines from the Kubernetes source code. Why? No idea! But apparently I have a talent for breaking Kube.

In fact, this particular issue led to me writing singyeong specifically to work around DNS failures. Writing what’s effectively a DNS replacement may sound silly, but it solved a very real need of mine at the time, fine-grained message routing combined with a complete lack of reliance on DNS for HTTP requests; this worked entirely because singyeong tracks clients via their IP rather than their hostname or other thing (and at this point, I’m slightly surprised Kube didn’t obfuscate it somehow).

Clusters straight-up breaking #

Yep! This includes, but is not limited to:

The cluster being incapable of deleting a pod, and not being able to free up its claimed PVCs for a replacement.
Not being able to view logs for some pods
Lots of ghost pods
Issues like this where my cluster is just toast.

Frankly, it’s a little ridiculous that this has happened every single time I’ve tried to use k8s for things. I don’t do anything abnormal either, no weird tinkering with the master node(s) or anything; I just try to use Kube like the application deployment platform it is. But every time I’ve done that, it ends up in an inexplicable state of dysfunction. It’s not even tied to cluster upgrades, which was my first thought. These clusters just… like to break. I don’t know why, I don’t know what I’m doing to cause it, nothing. It just breaks on its own, via means I don’t know or understand.

YAML sucks #

There, I said it. YAML sucks as a format for configuration of deployments. And I know there’s tools for it: Helm, jsonnet, and probably more that I’m just not aware of. But they all just universally suck. Either I have to write a fuckton of YAML for vanilla deployments, can write a fuckton and then some of YAML for use with Helm, write JSON for use with jsonnet, or probably some other special unique templating / scripting format for some other tool; I imagine there’s PLENTY of them out there.

I’ll be honest, I don’t know what a good replacement format would be. Just please, for the love of God, PLEASE don’t let it be more YAML or JSON. I’m really tired of the parsing ambiguities in YAML effectively forcing me to wrap everything in "quotes" just in case the parser decides to do something queer, or having to type so much {} for JSON. The latter really irks me, as I use an Ergodox EZ and my current layout, the {} keys are in a really awkward spot that meaningfully throws me off. And yes, I know I can just use the symbol layer or remap the keys, but I’m tired of it either way. I’ve been writing Java and other C-style languages since 2012; I’m ready to move on from the hell that is squiggly brackets.

scream of frustration #

So yeah, that’s why I think Kubernetes sucks. And yet, it’s still somehow the best tool for any sort of software deployments at scale. Somehow. Kube does do some really nice things, like avoiding the need for SSH / thinking about individual nodes / …, but so far it’s just not been worth it to me. Kube just creates more problems than it solves for me.

Kudos

Kubernetes sucks

Volumes disappearing #

Inexplicable DNS failures #

Clusters straight-up breaking #

YAML sucks #

scream of frustration #

Now read this

Dynamic function “definitions” in Elixir

Kubernetes sucks

Volumes disappearing #

Inexplicable DNS failures #

Clusters straight-up breaking #

YAML sucks #

*scream of frustration* #

Now read this

Dynamic function “definitions” in Elixir

scream of frustration #