Nikolay Sturm's Blog

Musings about Development and Operations

Reducing Template Complexity in Chef Cookbooks

| Comments

At work we have a pretty heterogenous server setup, physical machines with public IPs, machines on private networks and then some more nodes in the cloud. When I setup Nagios monitoring for these servers, it lead to pretty complicated templates with nested conditionals because chef’s node object does carry lots of information, but it does not abstract away the specifics of the node. So, to get the public IP of an EC2 instance, you would use something like

node['cloud']['public_ipv4']

whereas on a physical machine, this would be

node['ipaddress']

To complicate matters, in certain situations, you might want to use the host’s local IP address, for instance when EC2 instances are communicating amongst themselves.

When I had to integrate a new feature, I figured it was time to get rid of all this incidental complexity and abstract away the details. So what I ended up with was simple wrapper classes in plain Ruby around node objects that would adhere to the same interface, tailored to the templates that would use the objects.

To give an example, let’s look at our Nagios hosts.cfg.erb file.

<% @nodes.each do |n| %>
    define host {
        use server
        <% if n.run_list.roles.include?('datacenter_x')
            address 1.2.3.4
            host_name <%= n['hostname'] %>-dcx
            alias <%= n['fqdn'] %>
            hostgroups dcx
        <% else %>
            address <%= n['ipaddress']
            <% if n['cloud'] %>
                host_name <%= n.name %>
                alias <%= n['cloud']['public_hostname'] %>
            <% else %>
                host_name <%= n['hostname'] %>
                alias <%= n['fqdn'] %>
            <% end %>
        <% end %>
        <% if n.run_list.roles.nil? || n.run_list.roles.empty? %>
            hostgroups all
        <% else %>
            hostgroups <%= n.run_list.roles.to_a.join(',') %>
        <% end %>
    }
<% end %>

Pretty complicated, isn’t it? After wrapping each node object, the template changes to this:

<% @nodes.each do |n| %>
    define host {
        use server
        address <%= n.nagios_ip %>
        host_name <%= n.hostname %>
        alias <%= n.alias %>
        hostgroups <%= n.hostgroups %>
    }
<% end %>

This is how templates should look like, no logic, no complexity.

In order to achieve this template simplicty, the logic has to move elsewhere. In this case I created different classes for our different kinds of nodes and wrapped them in the recipe:

nodes = nagios_nodes(search(:node, "*:*"))

With nagios_nodes() being defined in a library file in the cookbook.

def nagios_nodes(nodes)
  nodes.map do |node|
    if ...
      DatacenterNode.new(node)
    elsif ...
      CloudNode.new(node)
    else
      raise "Unexpected kind of node: #{node.name}"
    end
  end
end

The CloudNode could then look something like this:

class CloudNode
  def initialize(node)
    @node = node
  end

  def alias
    @node['cloud']['public_hostname']
  end

  ...
end

I am not following the Chef community much, so if you happen to know alternative approaches, please let me know in the comments.

Getting Started With Test-kitchen

| Comments

Test-kitchen is an end-to-end testing framework for chef cookbooks. It manages VMs, runs your cookbook under test and then verifies the cookbook brought about the changes you expected.

Test-kitchen is moving quickly and documentation is sparse, so I found it pretty hard getting started with test-kitchen, vagrant-lxc and chef-zero. After lots of reading I finally realised the following setup.

VM Management

Test-kitchen has a plugin architecture that supports different virtualization backends. One of them is the kitchen-vagrant backend. It uses vagrant, which I think of as an abstraction layer on top of the actual virtualisation technique. As some of my colleagues are on Mac OS X, this should make it a little simpler for them.

To install vagrant, you have to download a package from http://downloads.vagrantup.com/ and install it manually.

By default, vagrant uses VirtualBox as virtualization backend. This is a pretty decent default for a cross-platform tool but as such, it is not the optimal solution for my use case.

Doing end-to-end testing with virtual machines will be slow as hell, even more so when you are used to unit testing where runtimes are measured in milliseconds. The fastest virtualization technique on Linux is, to the best of my knowledge, Linux Containers (lxc).

Luckily vagrant also has a plugin architecture, so switching to lxc is mostly a matter of installing prerequisite packages and the lxc plugin.

$ sudo apt-get install lxc lxc-templates cgroup-lite redir
$ vagrant plugin install vagrant-lxc

Fábio Rehm, the author of vagrant-lxc, has put together a blog post, explaining how to setup an lxc base box. At the end of this post he mentions to add your own stuff to the base container. For our intended setup, we need to install a few packages.

Log into the container with user and password vagrant, then do this:

$ sudo apt-get -y install curl build-essential
$ cd /tmp && curl -O https://opscode-omnibus-packages.s3.amazonaws.com/ubuntu/12.04/x86_64/chef_11.6.0-1.ubuntu.12.04_amd64.deb
$ sudo dpkg -i chef_11.6.0-1.ubuntu.12.04_amd64.deb && rm chef_11.6.0-1.ubuntu.12.04_amd64.deb
$ sudo apt-get clean
$ sudo /opt/chef/embedded/bin/gem install chef-zero knife-essentials
$ sudo halt

Installing chef and chef-zero into the base box speeds up test-kitchen runs considerably. If you don’t mind the time installing those every test run, feel free to leave them out of the base box.

Finally create the base box according to Fábio’s blog post. I extracted this shell script from his post.

Installing test-kitchen

I manage all my gems with bundler, so the first step to installing test-kitchen is to setup a Gemfile in the chef repository’s root.

source 'https://rubygems.org'

gem 'berkshelf', '~> 2.0.0'
gem 'chef', '~> 11.6.0'
gem 'chef-zero'
gem 'json', '1.7.7'                       # needed for conflict resolution
gem 'kitchen-vagrant'
gem 'test-kitchen', '1.0.0.beta.3'

We will use berkshelf to manage cookbook dependencies (see below). We have to list chef and chef-zero here (although we already installed them above), so that test-kitchen will be able to load them.

As described in the comment, the explicit json dependency is required for conflict resolution. Otherwise bundler will fail with varying errors.

Running bundle install should install all gems without errors.

Using test-kitchen

To get started we initialise an empty cookbook.

$ mkdir cookbooks
$ knife cookbook create my_sample_cookbook -o cookbooks
$ cd cookbooks/my_sample

and create a config file for test-kitchen in the cookbook’s directory.

Let’s have a look at some of the specifics of this file.

driver_plugin: vagrant
provisioner: chef_zero

First we setup our toolchain with vagrant and chef-zero.

driver_config:
  http_proxy: http://10.0.3.1:8123
  https_proxy: https://10.0.3.1:8123

Using an http proxy, is a nice performance optimisation. Just run a local proxy like polipo and make sure it listens on the lxc interface.

platforms:
- name: ubuntu-12.04

In the platforms section we setup the different environments, test-kitchen will use to test our cookbooks.

suites:
- name: default
  data_bags_path: ../../data_bags

Finally, the suites define different test suites for the cookbook. For instance, you could have different test suites for client and server recipes.

By default, chef-zero uses roles, data bags and nodes from your cookbook, e.g. cookbooks/my_sample/data_bags. You can override these paths with roles_path, data_bags_path, and nodes_path if you prefer putting test data elsewhere or even using production data for your tests.

To run test-kitchen, call kitchen test and see it iterate over all platforms and suites. Nice!

Adding Tests

Depending on your preferences there are different ways of writing tests for test-kitchen. All of them (AFAIK) are based on minitest. I am used to rspec, so I prefer using minitest/spec.

If you google around, you can find different ways of providing tests to test-kitchen that work in one or another scenario. Here, I’ll describe the simplest way I could find for making it work with minitest/spec.

First we create a Berksfile to manage cookbook dependencies.

metadata
cookbook 'minitest-handler', github: 'btm/minitest-handler-cookbook'

With the metadata option we delegate dependency specification to the cookbook’s metadata.rb file. This way we don’t duplicate that information. In the Berkfsfile we only add test-related cookbooks explicitly.

There were a few important bugfixes for the minitest-handler cookbook, but no new release yet, so we have to use the github master.

Berkshelf will automatically install all necessary cookbooks in our VMs.

Now that we have the testing framework installed, it’s time to write an actual test. The easiest way is to put the test file in the cookbook’s files/default/test folder. Other people use specific test-cookbooks but the reason for doing so is beyond me.

In my sample test I utilize chef’s data bag search, which only works after including the proper dsl module. Expect a similar setup for anything non-trivial.

With the tests in place, running kitchen test will hopefully fail and only succeed after you have added a proper recipe to the cookbook.

If you have any improvements to this setup, let us know in the comments.

Happy testing!

Context for Agile Practices

| Comments

When programming, I often have a hard time following the rules of the trade. Instead of doing test-first TDD, I only start testing after creating some bits of code. Instead of pair-programming, I rather prefer sitting down with the code in solitude.

Why is that? Why do I have such a hard time doing as the gurus tell us to do? Why, however, are there situations, where I am happy pair-programming and doing test-first TDD?

One way I came to think about this is in light of the Cynefin framework. Cynefin is a sensemaking framework that can be used to identify different contexts and help you choose appropriate strategies.

The Cynefin Framework of Sensemaking

Let me introduce the framework with a few examples from my field of work, web development.

Let’s say I have to change some copy on a static website. Everything is clear, the solution is self-evident. I apply best practices and finish the task. In terms of the Cynefin framework, this task can be considered simple and the way to deal with it is sense, categorize, respond.

More often than this, I have to change some functionality on the website. The task is small enough, but I still have to figure out how to go about the change. How exactly does the old code work? Where are the few lines of code I need to change? In other words the solution requires expert knowledge and is not self-evident. There might even be several proper solutions. Cynefin calls this kind of problem complicated and we have to sense, analyze, respond. We still apply good practice, which means we adapt to the situation at hand.

If the change to our website is bigger, the solution space opens up even more and we enter the complex domain. An important effect in this domain are unkown unkowns, we no longer understand which questions we have to ask. We cannot anticipate which parts of the application will interact with our new feature and in what way exactly. In this case, solutions emerge from code spikes (or save-to-fail experiments in Cynefin terms). We can only increase our knwoledge by interacting with the system, that is we probe, sense, respond.

The last of the four basic domains of Cynefin is the chaotic domain. This is a transitory domain which we find ourselfs in, when, for example, an emergency production bug shows up. We don’t have time to analyze the problem, but have to act immediately and get the site back up. Cynefin calls this approach act, sense, respond. However, by acting successfully, we move the problem to the complex domain and buy us the time needed to solve it properly.

Application to Agile Software Development

What does all this have to do with agile practices? Well, I would argue, that different practices serve different needs in the picture I painted above.

Pair Programming

Let’s start with pair programming. While the driver is working on the code at hand, the navigator is thinking strategically and reviewing the driver’s code. For simple problems, there is no need to think strategically as problems are just too small. We might still want some code review, but that can easily be done after the fact. I consider pair programming wasteful for the navigator in this situation.

When facing complicated problems, the navigator is helpful in routing us through the solution space. As we write non-trivial production code, direct code review is helpful. Solutions might get big enough, to overwhelm reviewers otherwise.

If we look at complex problems, there is a need to interact with the system in order for solutions to emerge. We could do this in pairs, however, I wonder if developer time is really spent best pairing on a solution or if it weren’t better to explore different approaches in parallel. Parallel exploration would also be more in line with the idea of multiple safe-to-fail experiments.

Finally, for chaotic situations, the goal is to get them under control as soon as possible. If talking helps you solve problems, pair. Otherwise, don’t.

In a nutshell, I consider pair programming a technique that is mainly useful in complicated situations. It might have some application in complex or chaotic situations, but I certainly wouldn’t prescribe it.

TDD

Let’s have a look at another technique, test-first TDD. TDD’s value lies in design feedback and ensurance of proper functionality of the code.

I have a hard time thinking of any simple programming task. Simple problems in my work almost always deal with changes I don’t test automatically.

Complicated tasks usually involve changing or extending some existing functionality. After analyzing existing code, it is usually pretty clear how to proceed and thus we apply test-first TDD.

Complex tasks usually come in the form of adding new functionality to an application. We might have different ideas about how to approach the task and Cynefin suggests spiking them in order to interact with the system let a sensible solution emerge. While spiking a solution, we hardly care about code design. Instead we explore different options in order to see how to they work in our application. This code also needn’t be correct, seeing interactions is usually good enough. Therefor I argue that TDD doesn’t give much value in this context. Instead we should focus on exploring the solution space as quickly as possible.

There is another interesting aspect to tests. Dave Snowden talks about a constraint based definition of Cynefin’s domains. In this view, problems range from highly constraint (simple) to unconstraint (chaotic). Adding tests to code could be considered increasing the constraints and thus moving a problem from one domain to another. I see this idea realized in a technique called spike and stabilize by Dan North. Spiking is a natural way of working with complex problems, however, to exploit the solution productively, we need to move the solution into the complicated domain at some point. This is what he calls stabilize or adding tests.

To sum it up, TDD is a practice that is also mainly helpful in complicated situations to write production grade code. Tests can be used to move exploratory code (complex context) to production grade code (complicated context).

Operations

A similar progression can be observed on operational tasks. With non-trivial software, we often start in the complex domain, poking around to make it work. Once we get it working, we document what we did. This way we constrain the next guy’s interaction with the software, so he just has a complicated problem. However, some expert knowledge is still required as documentation is seldomly perfect. When we interact more often with the software, we should at some point automate the interaction. The task then becomes simple as automation means maximum constraint (for example, think fully automated server deployments)

Conclusion

Coming back to my problems of applying TDD and pair programming in my daily work. I see myself confronted with complex problems almost daily. When I begin work on a new feature, I hardly have any idea about the solution. For me, it works best to sit down alone with the code and form a rough picture of a solution. Only then am I able to apply TDD and pair programming in a helpful manner, shifting the solution to the complicated domain.

However, I still wonder if I am missing something, as the gurus of our trade seem to be able to apply these techniques almost always. What do you think?

Ri vs. Bundler

| Comments

Being a long time unix user, I love my command line. When it comes to programming in ruby, I am a big fan of ri to look up documentation and generally figure out how stuff works.

Also, I am a big fan of rvm to organize my ruby projects. With rvm, gems are installed into project specific gemsets. rvm takes care of setting up my environment, so that ri has access to the proper files.

I was quite dumbfounded, when I realized bundler did not only not generate ri-documentation, but that there wasn’t any configuration option to enable it. After more than 2 years, I finally tackled this problem. As often, the solution turned out to be quite simple.

All it takes is a little shell function, that wraps my calls to bundler and generates missing documentation afterwards.

b() {
  bundle $* && gem rdoc --all --ri --no-rdoc 2>&1 | grep --color=auto -v -e '^Gem::SourceIndex' -e '^NOTE: Gem::SourceIndex'
}

So what does this do? First it calls bundle, passing all arguments right through. The usual use case is to install gems, of course. If bundle succeeds, gem rdoc creates all missing ri-documentation. This is actually pretty fast, as usually there’s not much to create. Finally I filter some annoying warning messages.

All it takes is to put this function declaration into one of your shell initialization files (I prefer ~/.bash_aliases), open a new shell and start hacking!

HTH

Insights Into TDD

| Comments

About two years ago I began my journey into professional software development. With it came the urge of mastering TDD, a practice I had long observed as a bystander.

While I found and ingested lots of material about TDD, turning the knowledge into praxis was astonishingly hard for me. It was only a couple of weeks ago that it finally clicked and I felt the pieces comming together.

I won’t describe any new techniques in this post. Everything presented herein, I read somewhere. However, I hope this collection of techniques will help others getting TDD faster.

For the remainder of this post, I’ll talk about a small application that scrapes a website for the component companies of a stock index. I developed it using BDD’s approach of outside-in development.

Focus on the Problem Domain

Whenever I wanted to use TDD for a new feature or a completely new side project, I was faced with the problem of having to formulate my first acceptance test. I could never figure out how to get started. How could I formulate a test for code that I had no idea of?

When I began writing the stock index scraper, I started with an acceptance test for the scraper class and was blocked as usual. I had no idea about what that code should actually do and with what other classes it would interact.

At that point I reminded myself of focus on the problem domain, not the solution domain. My job was not to write a website scraper, but instead I wanted a program to update the list of stock index components.

With that in mind, I could formulate the first acceptance test.

feature 'Updates the list of tracked equities' do
  scenario 'with all DAX stocks' do
  end
end

The next question became what my application’s entry point should be.

Use Case Classes

A technique I had picked up from Corey Haines and others was use case classes. Usually you name classes after domain nouns, but sometimes it makes more sense to go with a verby name, describing a use case. Here, the acceptance test already provides a sensible use case, so my application’s entry point ended up being UpdatesListOfTrackedEquities.execute().

This class/method should go through all equities in the DAX and save a record to the database. To keep it simple, I just checked the amount of records in the database afterwards and the final acceptance test became:

feature 'Updates the list of tracked equities' do
  scenario 'with all DAX stocks' do
    UpdatesListOfTrackedEquities.execute

    expect(Equity.count).to eq 30
  end
end

Canned Values for Acceptance Tests

Another technique I picked up from Corey (and later J. B. Rainsberger) was getting the test green quickly by using canned values. This is normal practice when doing code katas, but I hadn’t thought about doing it with an acceptance test before.

class Equity
  def self.count
    30
  end
end

Later, when TDDing lower level behaviour, I would eventually get back to Equity.count() and provide a proper implementation.

I find this technique quite motivating, as it provides a green test suite early on. Keeping the acceptance test broken would often distract me from the task at hand.

Discover Interfaces

Next up I had to implement UpdatesListOfTrackedEquities.execute(). Again focussing on the problem domain, I came up with a simple unit test.

There are different stock exchanges in germany and the online XETRA platform was the one I wanted to work with. I wanted to track stocks from different indices, so I needed a class for each index on XETRA. This class would represent the stock index, so it made sense to have a method .constituents() that would return the list of companies making up that index.

describe UpdatesListOfTrackedEquities do
  describe '.execute' do
    let(:company1) { double('Company') }
    let(:company2) { double('Company') }

    it 'tracks all DAX companies on XETRA' do
      XETRA::DAX.stub(:constituents).and_return([company1, company2])

      Equity.should_receive(:track).with(company1)
      Equity.should_receive(:track).with(company2)

      UpdatesListOfTrackedEquities.execute
    end
  end
end

At this moment I realized that interface discovery was all about thinking in the problem domain.

Only when implementing XETRA::DAX.constituents() had I finally to create the website scraper. The class that would actually do most of the hard work.

Class Interfaces vs. Library Interfaces

When implementing the website scraper, I had to use external libraries like httpclient or nokogiri. At this point my TDD practice somehow broke down and I didn’t create more intermediate objects like interface classes, but instead I just used those libraries directly from the scraper class. At first I felt bad about it, but then I came to the conclusion that this might actually be sensible.

One advice I often read was to wait for several similar use cases before abstracting an interface. I was always mildy confused as to how this related to interface discovery. At this point in my sample application, I concluded that interface abstraction is the way to go when using an existing class, often in the form of an external library. Interface discovery, on the other hand, was the way to go when creating new domain model classes.

This is in line with one reason for creating library interfaces, which is to encapsulate library usage in a single class. If my website scraper is the only class using, e.g. httpclient, its usage is limited to one class. Therefor, I don’t gain much by putting an interface in front of it. Once I have two or three users of httpclient, I see how my application actually uses the library and I can put an interface in front of it.

Conclusion

It took me quite some time to wrap my head around TDD. Thinking in the problem domain had the most impact on me. Once I realised that, the other pieces fell into place.

What do you think? Was this all out there and I just didn’t see it? Did I misunderstand anything? Share your ideas in the comments below!