{"id":109,"date":"2008-01-11T23:09:41","date_gmt":"2008-01-12T03:09:41","guid":{"rendered":"http:\/\/www.phildev.net\/phil\/newblog\/?p=109"},"modified":"2008-01-11T23:09:41","modified_gmt":"2008-01-12T03:09:41","slug":"","status":"publish","type":"post","link":"https:\/\/www.phildev.net\/phil\/blog\/?p=109","title":{"rendered":"Spine, Provision, and Onall"},"content":{"rendered":"<p>A few weeks ago, we at Ticketmaster, silently open-sourced our internal system configuration software which consists of 3 pieces of software. I led the effort to get them open-sourced, so I&#8217;m particularly happy about and proud of this. In addition, I develop on all 3 pieces of software, and am the primary author of one of them.<\/p>\n<p>Now, there are already many pieces of open-source system configuration software out there: cfengine and puppet are the two best known. So why release our own? Well, the software out there didn&#8217;t meet our needs &#8211; not just on a feature level, but on a design level. Several years ago, we wrote out own, and have been developing it ever since. Having matured, we felt this was something the community could benefit from.<\/p>\n<p>The software 3 pieces of software we released were:<\/p>\n<ul>\n<li><strong><a href=\"http:\/\/spine-mgmt.googlecode.com\/\" target=\"_blank\">spine<\/a><\/strong> &#8211; A scalable, hierarchical, template-able, and pluggable\/extensible Linux configuration management framework<\/li>\n<li><strong><a href=\"http:\/\/sysprovision.googlecode.com\/\" target=\"_blank\">provision<\/a><\/strong> &#8211; A pluggable\/extensible system for provisioning DNS, NFS storage, Virtual Machines, and other things needed to get a system up and running (spine runs on a given machine once it&#8217;s up, so think of provision as the bootstrap for getting a system up so it can run spine).<\/li>\n<li><strong><a href=\"http:\/\/onall.googlecode.com\/\" target=\"_blank\">onall<\/a><\/strong> &#8211; A small utility for parallel ssh-based administration, similar but more scalable than clusterssh.<\/li>\n<\/ul>\n<p>With these three pieces of software, Ticketmaster manages a global infrastructure of more than 3000 machines. In fact, the majority of those machines (almost 3000) are run by a team of 8 people. The team was recently grown to help account for the many new projects we&#8217;re taking on, but when I joined, 4 people (including myself) were managing ~2000 Linux systems running a variety of configurations and software on a variety of hardware, in a variety of locations. What this means is <strong>automation<\/strong> and <strong>high server-to-admin ratios<\/strong>.<\/p>\n<p>&#8220;Sure,&#8221; you say, &#8220;but it assumes you&#8217;re running an infrastructure like Ticketmaster&#8217;s!&#8221; Well, that&#8217;s not true. The last several months of development time were purely dedicated to abstraction and configurability of the way in which these tools work. That isn&#8217;t to say there aren&#8217;t assumptions &#8211; there are. However the assumptions are not specific to Ticketmaster, and every assumption made is documented and either on the list to remove or justified.<\/p>\n<p>So, lets have a look at what each of these tools does and why you want to use it.<\/p>\n<h2>Spine<\/h2>\n<p>I mentioned that Spine is a configuration management <em>framework<\/em>. Unlike other configuration management systems, it&#8217;s not designed to, for example, know how a DNS server should be configured. Instead, it&#8217;s designed for you to show it how you <em>want<\/em> a DNS server to be configured, and to be able to configure that server, 100% reproducibly. Further, it&#8217;s designed to be able to account for, say, slight variations among your DNS servers, but still allow you to have a single definition of what the configuration files should look like. This is done through a combination of a <strong>hierarchical data model<\/strong> and <strong>templates<\/strong>.<\/p>\n<p>Spine, like most extensible tools, is broken into a set of plugins. You have PackageManager plugin which makes calls to your package-manager (yum, apt, etc. &#8211; it currently only supports apt-rpm, but that&#8217;s in the process of changing). PackageManager make sure the software you need installed is installed and further that <strong>only<\/strong> the software you need installed is there. It does this by walking the dependency tree for the allowed list of software and removing anything else installed. This <strong>prevents bit rot<\/strong> and <strong>enforces consistency<\/strong> &#8211; two common problems in large environments.<\/p>\n<p>Spine also has a SystemHarden plugin to perform various hardening tasks like strip setuid\/setgid bits from anything that&#8217;s not supposed to have them. There&#8217;s a Startup plugin to ensure the services that are supposed to be running are running and configured to start at boot. It also provides service restarting for services whose configuration is changing due to spine.<\/p>\n<p>These are all the basics that are required for configuration system, but they&#8217;re not unique. What is unique is that every configuration file you put into the spine configuration is template-ized and evaluated on that box using the configuration data that applies to that box. Before one can see how powerful this is, we need to take a look at the data model.<\/p>\n<p>The data-model for spine is hierarchical. This hierarchy is configurable, but let&#8217;s make up an example hierarchy. We might have a network level for every network &#8211; for example, lets say spine is running on a machine with the IP address of 192.168.1.1\/22, it&#8217;ll descend the configuration for the \/network\/192.168.1.0-22 part of the tree. Next up, we might want to descend the businessunit for that machine (which at Ticketmaster is determinable by the hostname, but spine can get this information from a variety of places). I work in a group called &#8216;websys,&#8217; so spine would descend \/websys. Here things like which sysadmins have access will get added to the configuration. Next up, we may want configurations based on the type of environment: dev, qa, staging, prod, etc. and descended <strong>under the business unit<\/strong>. So, if this is a machine in our dev environment, spine would add configuration data for \/websys\/dev. Of course, we could also configure spine to descend some global \/dev area too. Next, the configuration for the product in dev would make sense to descend. For example, \/websys\/dev\/front_web, and so on, until there&#8217;s nothing left to descend.<\/p>\n<p>Of course, hierarchy is great, but sometimes you want to bypass it. For example, all machines of product &#8220;&lt;product&gt;&#8221; might need a common config regardless of being in dev, or qa, or prod, or anywhere else. So, each level can include other parts of the tree. These other parts of the tree that are not inherently in your hierarchy are called &#8220;config groups&#8221;&#8230; they can be stored anywhere, but at Ticketmaster, we store them at the business-unit level since that makes the most sense for us.<\/p>\n<p>So, lets take an example. Lets say, at your global level, you have a key called &#8220;packages&#8221; which is what the PackageManager plugin uses to determine what should be installed. At the global level you have the base packages for your whole company. Then, at your business unit, you add a few packages your team uses. Keys are cumulative, so at your business unit, you just define the extras. Of course, dev will need compilers, debuggers and profilers, so you just add that at the dev level. And so on. Now you have a complete package list that&#8217;s dynamic per class of machine.<\/p>\n<p>Using this hierarchical data, you can define templates written in TemplateToolkit for configuration files such as <em>\/etc\/hosts<\/em>, <em>\/etc\/fstab<\/em>, <em>httpd.conf<\/em>, <em>odbc.ini<\/em>, etc. While we currently have a special plugin for handling authentication (\/etc\/passwd, authorized_keys, \/etc\/shadow, etc.), you could even write templates to handle these (and we did for years).<\/p>\n<p>What this means is having <strong>one<\/strong> \/etc\/hosts for your entire businessunit (or for your entire company if that makes sense). Just update the data so that the template can be rendered correctly for each machine. TemplateToolkit is a fully-featured (if somewhat odd) programming language. If you write your templates correctly, you rarely ever have to change them &#8211; you just check in new config data and your templates adjust themselves. This makes managing thousands of hosts a lot like managing 10 hosts.<\/p>\n<p>Plugins are easy to write. Spine is written in perl and has a standard API for new plugins.<\/p>\n<h2>Provision<\/h2>\n<p>Provision was written by yours truly to handle a gap. Lets say spine knows how to configure webservers in your webfarm, and lets say you need to add 10 to one of  your farms. Well, once it&#8217;s up and running, spine will do the rest. But from nothing to &#8216;running&#8217; isn&#8217;t handled. This is were provision comes in. Provision will find an IP in the right network, allocate DNS for it (forward and reverse), allocate NFS storage (well, assuming you have NetApp filers, but a plugin could easily be written for anything else), and optionally create a VMWare virtual machine for it, and start that machine booting and kickstarting. If this machine needs to be built on real hardware, the only step it won&#8217;t cover is starting the kickstart process.<\/p>\n<p>Provision assumes that your reverse DNS zones are the authoritative location for IP allocations, and that RCS is a valid locking mechanism. Like spine, most of provision&#8217;s work happens in plugins: the DNS plugin, the filer plugin, and the VM plugin. As such you can chose what sorts of allocations will happen. And since spine is so good at writing configuration files, you can have it write out the config file for provision.<\/p>\n<p>Provisions configuration comes from an easy and flexible configuration file format. Using this file, it&#8217;s easy to represent a very wide variety of configurations so that provision can properly provision things in your environment. However, if the configuration has a limitation that prevents you from expressing your environment, you can write a plugin hooking into the necessary callbacks to change how provision will act.<\/p>\n<p>Plugins for provision are also very easy to write. In fact, there are two kinds: provisioning plugins, and local decision plugins. A provisioning plugin is something you write to provision some resource. The DNS plugin is an example of this. Local decision plugins are what I mentioned in the previous paragraph. Provision provides a series of callbacks for local decision plugins so you can hook into provision to make complicated decisions. While provision is highly configurable, there&#8217;s no way to predict the kind of configurations every shop will have &#8211; so provision was designed with callbacks throughout all of its logic for maximum flexibility.<\/p>\n<h2>Onall<\/h2>\n<p>Onall is a simple perl utility to ssh to multiple machines in parallel and execute some commands &#8211; or alternatively, to copy a script to the remote machines, run it, and remove it. It&#8217;s configurable &#8211; you can choose how many machines to do in parallel, what timeouts to set, how long to wait between parallel sets, and more.<\/p>\n<p>Unlike clusterssh, onall doesn&#8217;t open a window to each machine. Onall is for non-interactive work, so don&#8217;t go throwing away clusterssh &#8211; it&#8217;s an invaluable tool for interactive parallel work. However, it doesn&#8217;t scale for highly parallel work &#8211; especially when interactivity isn&#8217;t needed.<\/p>\n<p>Onall can dump the output of your work to your terminal window, or to a directory &#8211; one file per system. If you let it go to your terminall, you can choose to let onall buffer it and try to group it into readable segments per host, or to just dump it as it comes. Onall takes a list of hostnames on standard in, and the commands to run as command-line arguments. While we have an internal tool to generate a list of hosts for us (called nhs), it&#8217;s highly Ticketmaster specific and releasing it would have done no one any good. We assume you have or can write a similar tool.<\/p>\n<h2>Strengths and Weaknesses<\/h2>\n<p>I covered a few of the strengths of these tools in their descriptions, but lets go ahead and list the big strengths and weaknesses of these tools here.<\/p>\n<p>Our strengths are&#8230;<\/p>\n<ul>\n<li>Reproducibility &#8211; You know you can always make the same machine again (from a configuration perspective).<\/li>\n<li>Scalability &#8211; Using templates and a hierarchical data model means that your sysadmins will scale further than ever before.<\/li>\n<li>Consistency &#8211; Lose bit-rot. Need to install &#8216;strace&#8217; in prod? Fine, it&#8217;ll be gone on the next spine run!<\/li>\n<li>Accountability &#8211; The configuration backend for spine is subversion, so everything is revision controlled and logged. Further, spine logs it&#8217;s actions to syslog.<\/li>\n<li>Extensible &#8211; With simple perl APIs, it&#8217;s easy for a sysadmin to add his own logic, features, and needs quickly and reliably<\/li>\n<li>Safety &#8211; This code is GPLv3. Unlike proprietary solutions, this code can never disappear because the vendor went out of business.<\/li>\n<li>Proven track record &#8211; We&#8217;re releasing this because it&#8217;s worked well for us with a large number of systems and a small number of admins for over 3 years.<\/li>\n<\/ul>\n<p>Our weaknesses are&#8230;<\/p>\n<ul>\n<li>Assumes Redhat-like OS (being worked on)<\/li>\n<li>Requires subversion<\/li>\n<li>Getting your configs into the spine configuration can be time consuming and requires a reasonable amount of technical skill and a deep understanding of your current environment<\/li>\n<li>Currently, you cannot query data that doesn&#8217;t apply to the host spine is running on (being worked on)<\/li>\n<li>Only supports apt-rpm<\/li>\n<li>Spine can&#8217;t remove files. Once you remove a template or file from spine, it no longer knows about it (being worked on)<\/li>\n<li>Some errors are not tracked and reported as well as they should be (being worked on)<\/li>\n<\/ul>\n<h2>Things coming<\/h2>\n<p>We are constantly working on improvements. Coming very soon are:<\/p>\n<ul>\n<li>A new data publisher that&#8217;s faster, better integrated and can publish to a variety of formats (sqlite, isofs, etc.)<\/li>\n<li>A new data dictionary system<\/li>\n<li>A new abstraction layer for package management that will allow us to support more than just apt-rpm<\/li>\n<\/ul>\n<p>Those are the big ones, but there are other smaller features and a variety of bugs always been worked on and fixed respectively.<\/p>\n<h2>Getting the most out of these tools<\/h2>\n<p>If you decide to use these tools, here are a few things to keep in mind:<\/p>\n<ul>\n<li><strong>Regular Spine runs<\/strong><br \/>\nSpine can&#8217;t keep your systems in-sync if it&#8217;s not run regularly. Depending on your environment, your ability to perform regular maintenance may differ, but I recommend trying to never let a system go more than a month without a spine run. If you have a highly redundant environment where clusters can be taken out of service easily, frog-leaping spine-runs between clusters may be a good choice for you as well. Running spine regularly also helps enforce the next rule.<\/li>\n<li><strong>Everything goes in spine<\/strong><br \/>\nMake sure <em>all<\/em> your configurations are in spine. That doesn&#8217;t mean every config file has to be in spine. But any config file you ever need to change must go into spine. If, for example, you&#8217;ve never changed \/etc\/inittab, don&#8217;t put it in spine. But the first time you need to modify, it&#8217;s time to write a template for it. Even if that template is just a copy of the file with one little IF statement. If you don&#8217;t do this, then there&#8217;s no guarantee of your changes sticking around &#8211; and your system fails to be reproducible. Also, regular spine runs can enforce this because the spine run will remove (some) configurations it doesn&#8217;t know about.<\/li>\n<li><strong>Always provision with your provisioning tool<\/strong><br \/>\nIf you use provision with spine (I say if, because people are more likely to have this piece automated already), make sure you *always* use provision. Provision is designed have everything provisioned through it. If you know what you&#8217;re doing, you can do things manually, but provision does not attempt to sort it&#8217;s zonefiles or do other things to make manual editing particularly easy &#8211; and this is very much on purpose. If you don&#8217;t use provision, human errors like not provisioning reverse DNS are liable to happen. If this happens, provision now has inaccurate data. While provision is very good at detecting this, such a state is probably not one you want to be in.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>A few weeks ago, we at Ticketmaster, silently open-sourced our internal system configuration software which consists of 3 pieces of software. I led the effort to get them open-sourced, so I&#8217;m particularly happy about and proud of this. In addition, I develop on all 3 pieces of software, and am the primary author of one [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[5],"tags":[],"_links":{"self":[{"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=\/wp\/v2\/posts\/109"}],"collection":[{"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=109"}],"version-history":[{"count":0,"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=\/wp\/v2\/posts\/109\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=109"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=109"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.phildev.net\/phil\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=109"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}