Amazon ECS Fargate or Why I don't use Kubernetes

🚤
tl;dr I haven't had a use-case for Kubernetes that I couldn't solve with AWS ECS Fargate. When I do, I'll look at it.

I had a friend ask me about Kubernetes and I told them that this is one of my operational blindspots. Kubernetes has a lot of traction, but I also knew it had a learning curve that I didn't want to dive into until I needed to. The workloads I've been running have been at a relatively modest scale and fairly simple and AWS EC2 Container Service (ECS) has always been able to deliver.

If you have to work with cloud infrastructure this is my journey of avoiding Kubernetes. If Kubernetes makes sense for how you operate, by all means. I don't have any issues with it. I just don't think any one technology should hold you back from achieving your goals.

How I got to AWS Fargate

I've been using AWS since 2012 when I started as founding Operations Engineer at Pinterest. We used EC2 instances (think, computers in the cloud). We had fleets of hundreds of servers. We used bash scripts to scale them up and down on a schedule (our scripts probably pre-dated the auto-scaling feature that EC2 had). Our deploys had to happen in place in order to be quick. We had not yet mastered immutable servers.

When I went to work at a much smaller scale and I was tasked with building our AWS footprint, ECS had been released. Managing a small number of ECS Containers, and deploying via the Amazon CLI seemed like a much better idea. So we did that.

graph TD; EC2[ECS EC2 Container Instance] -->|hosts| ServiceA[ECS Service A] EC2 -->|hosts| ServiceB[ECS Service B] ServiceA -->|runs| TaskA1[ECS Task A1] ServiceA -->|runs| TaskA2[ECS Task A2] ServiceB -->|runs| TaskB1[ECS Task B1] ServiceB -->|runs| TaskB2[ECS Task B2] TaskA1 -->|contains| ContainerA1a[Container A1a] TaskA1 -->|contains| ContainerA1b[Container A1b] TaskA2 -->|contains| ContainerA2[Container A2] TaskB1 -->|contains| ContainerB1[Container B1] TaskB2 -->|contains| ContainerB2[Container B2]

This simplified deployments quite a bit. We launched an AMI with our ECS Container image and basically never touched it again. Then we would use Amazon's CLI tools to deploy services as needed. We ran our main app, our tools, everything on this container instance.

Fargate rolled out, but we already had a working system. There was no point to try this out, the pricing wasn't great at the time either.

The next time I was in a green field AWS situation was at Tome. We moved from Heroku to AWS, because I knew we'd reach the scale where this was necessary. I had learned quite a bit from Pinterest and the Private Equity firm and some of my DevOps peers. Fargate seemed like a great fit.

I didn't have to manage an EC2 instance, I didn't have to make sure I was using the right size of instance. I didn't have to worry about AMI images, I just need a task definition and a lot of ❤️.

graph TD; Fargate[Fargate] -->|runs| ServiceA[ECS Service A - Fargate] Fargate -->|runs| ServiceB[ECS Service B - Fargate] ServiceA -->|runs| TaskA1[ECS Task A1 - Fargate] ServiceA -->|runs| TaskA2[ECS Task A2 - Fargate] ServiceB -->|runs| TaskB1[ECS Task B1 - Fargate] ServiceB -->|runs| TaskB2[ECS Task B2 - Fargate] TaskA1 -->|contains| ContainerA1a[Container A1a] TaskA1 -->|contains| ContainerA1b[Container A1b] TaskA2 -->|contains| ContainerA2[Container A2] TaskB1 -->|contains| ContainerB1[Container B1] TaskB2 -->|contains| ContainerB2[Container B2]

This might look the same, but I don't manage Fargate. I just define a service and a task definition and I'm done.

How we create our services and tasks

We put our service definitions in Terraform. These don't change too much, we keep a constant fleet because at our current size we are not saving much by scaling down during our low periods.

💡
At some point we will move toward auto scaling groups that can watch our service health and scale up or down accordingly. There's no pressure to cut our Fargate spending, and we're currently over-provisioned so we can handle future growth.

Tasks definitions, are trickier. Tasks define the docker containers that make up your service. Our containers are a NodeJS server, a log router and our Datadog agent. These change because our custom NodeJS server is our app. We push a new container version to a Docker Registry on each commit to our main branch. We likewise need to update our task-definition with the new version and then we tell code deploy to roll out our change.

Code deploy creates a replacement service and uses ✨ and a load balancer to push traffic to the new service.

Conclusion

Our clusters are fairly simple. We have about 3 services that we manage, only one is running code that we develop. The other two are related to metrics collection and a PGBouncer instance. This eliminates the need for anything that Kubernetes might give us, and instead lets us rely on AWS infrastructure.

There might come a time where we attempt to achieve something that cannot be served by the systems we're using for containers right now. Or we will run into some situation where the management of our clusters is exceedingly painful. That will likely be a good time to re-evaluate the landscape and see if Kubernets or the next thing is right for us.


If you manage large scale cloud operations, containers or whatnot hit me up at k8s@33mail.davedash.com.I regularly meet in real life as part of ☕🔧 CoffeeOps and virtually through 📧 mailing lists.