Configuring Machines in the Cloud: Our Approach

We’ve done a lot of work recently to revamp the way we deploy computers in the cloud and I wanted to share a little bit about how we’re doing this at a pretty low level to give you an idea of how we are approaching this. Our software and processes are cloud agnostic, but we mostly work with Amazon Web Services because we feel that they offer the best solution for most of our clients at this time.

We maintain two base Linux images as part of our cloud toolkit. The only difference between the two images is their architecture. One is 64bit and the other is 32bit. The images are minimal– they have just enough software and configuration to get them off the ground and configured. We have copies of the images in each region in Amazon, but when it comes to maintenance and upgrades we really only deal with the two master images. All of the computers that we deploy in EC2 come from these two images.

The base image by itself is not very useful. When a computer is instantiated from one of the images, our toolkit combines it with our Puppet repository and some instance specific configuration. The Puppet repository contains the Puppet manifests for how we deploy software. The repository is where we store our collective knowledge around deploying successful software. The instance specific configuration is crafted by the developers and operations teams to pick and choose the appropriate things from the Puppet repository provide the very specific configuration about how to deploy the server and the application that will run on it. As the instance boots, it configures itself, installing the software and making the changes required to bring it into service.

This is all pretty low level, but it provides some capabilities that makes our solution very flexible:

  • With only two images to maintain, keeping software up to date is simple. We anticipate that we will be releasing new images about once a quarter to capture any updates to the packages in the base system.
  • Everything is version controlled. It is easy for us to see what a machine looked like on a specific date or understand the changes that have been made to how the software is configured on an instance.
  • The instances are very self sufficient. There is no single point of failure that would prevent instances from starting correctly.
  • This is all very portable. With just a little bit of work we can deploy things in a different region of Amazon. Also, our Puppet code and instance specific configurations can work in more places than just Amazon. With a little bit of work to recreate the base images in another platform we can consistently and predictable recreate infrastructure anywhere, giving our clients the ability to choose the right solution for them.

This last item is something that should be on everyone’s mind (especially considering the outage at Amazon last week). As Steve said last week, everything fails and you need to design your infrastructure and applications around that. A process for redeploying your infrastructure in another AWS region or a different cloud is an important part of building a very reliable service in the cloud. It is hard to say what the next kind of failure in the cloud will be like, but with a process like ours we can be ready to deal with outage when it happens.