What actually goes on during a deploy

Previously, I covered, at a high level, how our builds work and what tools we used. I wanted to explain what exactly we are doing during a deploy, with an emphasis on tracking state. It’s one area where there aren’t a lot of good off-the-shelf tools that can just “do it for you.”

A build can be deployed to a set of canary hosts, or to our entire fleet. We record this state in Zookeeper.

“Canarying” to a few hosts gives us time to validate that everything is working as planned. For the most part this involves looking at charts, error logs, error aggregators, and clicking around on the site. All very manual.

On each server sits a fairly robust Zookeeper client called deployd. This listens to either the enabled or enabled/canary Zookeeper node depending on it’s role (which is also defined in Zookeeper). If there is any change to these nodes a few checks are performed:

  • The node checks that it’s serving the correct build.
  • If it’s not, it downloads the correct build and extracts it.
  • It atomically flips a symlink on the node to point at the new build.
  • It does any post-install steps, like installing dependencies. For example we use pip to manage our python requirements so we do something like pip install -r requirements.txt after we’ve installed a new build.
  • The service is restarted in the most graceful way.
  • Current status of the node is reported back to Zookeeper.

Graceful restarts are somewhat complicated. We employ different strategies for different services. In the most advanced form, multiple copies of the same service are running on a machine. Through iptables we are able to turn off traffic to a few instances while we restart them. With most services we define a restart concurrency that defines how many nodes will be restarted at any given time. In some cases we can restart almost all the nodes at once with no user impact.

We can then monitor our deploy in our state-of-the-art deploy monitoring tool.

I’d like to point out that I’m no expert at curses, but had the help of Erik Rose’s blessings module which makes light work of the terminal.

Ping me on twitter (@davedash) if you want to talk about the strategies you’ve developed to deploy software at your organization.