Nikolay Sturm's Blog

Musings about Development and Operations

Ri vs. Bundler

| Comments

Being a long time unix user, I love my command line. When it comes to programming in ruby, I am a big fan of ri to look up documentation and generally figure out how stuff works.

Also, I am a big fan of rvm to organize my ruby projects. With rvm, gems are installed into project specific gemsets. rvm takes care of setting up my environment, so that ri has access to the proper files.

I was quite dumbfounded, when I realized bundler did not only not generate ri-documentation, but that there wasn’t any configuration option to enable it. After more than 2 years, I finally tackled this problem. As often, the solution turned out to be quite simple.

All it takes is a little shell function, that wraps my calls to bundler and generates missing documentation afterwards.

b() {
  bundle $* && gem rdoc --all --ri --no-rdoc 2>&1 | grep --color=auto -v -e '^Gem::SourceIndex' -e '^NOTE: Gem::SourceIndex'
}

So what does this do? First it calls bundle, passing all arguments right through. The usual use case is to install gems, of course. If bundle succeeds, gem rdoc creates all missing ri-documentation. This is actually pretty fast, as usually there’s not much to create. Finally I filter some annoying warning messages.

All it takes is to put this function declaration into one of your shell initialization files (I prefer ~/.bash_aliases), open a new shell and start hacking!

HTH

Insights Into TDD

| Comments

About two years ago I began my journey into professional software development. With it came the urge of mastering TDD, a practice I had long observed as a bystander.

While I found and ingested lots of material about TDD, turning the knowledge into praxis was astonishingly hard for me. It was only a couple of weeks ago that it finally clicked and I felt the pieces comming together.

I won’t describe any new techniques in this post. Everything presented herein, I read somewhere. However, I hope this collection of techniques will help others getting TDD faster.

For the remainder of this post, I’ll talk about a small application that scrapes a website for the component companies of a stock index. I developed it using BDD’s approach of outside-in development.

Focus on the Problem Domain

Whenever I wanted to use TDD for a new feature or a completely new side project, I was faced with the problem of having to formulate my first acceptance test. I could never figure out how to get started. How could I formulate a test for code that I had no idea of?

When I began writing the stock index scraper, I started with an acceptance test for the scraper class and was blocked as usual. I had no idea about what that code should actually do and with what other classes it would interact.

At that point I reminded myself of focus on the problem domain, not the solution domain. My job was not to write a website scraper, but instead I wanted a program to update the list of stock index components.

With that in mind, I could formulate the first acceptance test.

feature 'Updates the list of tracked equities' do
  scenario 'with all DAX stocks' do
  end
end

The next question became what my application’s entry point should be.

Use Case Classes

A technique I had picked up from Corey Haines and others was use case classes. Usually you name classes after domain nouns, but sometimes it makes more sense to go with a verby name, describing a use case. Here, the acceptance test already provides a sensible use case, so my application’s entry point ended up being UpdatesListOfTrackedEquities.execute().

This class/method should go through all equities in the DAX and save a record to the database. To keep it simple, I just checked the amount of records in the database afterwards and the final acceptance test became:

feature 'Updates the list of tracked equities' do
  scenario 'with all DAX stocks' do
    UpdatesListOfTrackedEquities.execute

    expect(Equity.count).to eq 30
  end
end

Canned Values for Acceptance Tests

Another technique I picked up from Corey (and later J. B. Rainsberger) was getting the test green quickly by using canned values. This is normal practice when doing code katas, but I hadn’t thought about doing it with an acceptance test before.

class Equity
  def self.count
    30
  end
end

Later, when TDDing lower level behaviour, I would eventually get back to Equity.count() and provide a proper implementation.

I find this technique quite motivating, as it provides a green test suite early on. Keeping the acceptance test broken would often distract me from the task at hand.

Discover Interfaces

Next up I had to implement UpdatesListOfTrackedEquities.execute(). Again focussing on the problem domain, I came up with a simple unit test.

There are different stock exchanges in germany and the online XETRA platform was the one I wanted to work with. I wanted to track stocks from different indices, so I needed a class for each index on XETRA. This class would represent the stock index, so it made sense to have a method .constituents() that would return the list of companies making up that index.

describe UpdatesListOfTrackedEquities do
  describe '.execute' do
    let(:company1) { double('Company') }
    let(:company2) { double('Company') }

    it 'tracks all DAX companies on XETRA' do
      XETRA::DAX.stub(:constituents).and_return([company1, company2])

      Equity.should_receive(:track).with(company1)
      Equity.should_receive(:track).with(company2)

      UpdatesListOfTrackedEquities.execute
    end
  end
end

At this moment I realized that interface discovery was all about thinking in the problem domain.

Only when implementing XETRA::DAX.constituents() had I finally to create the website scraper. The class that would actually do most of the hard work.

Class Interfaces vs. Library Interfaces

When implementing the website scraper, I had to use external libraries like httpclient or nokogiri. At this point my TDD practice somehow broke down and I didn’t create more intermediate objects like interface classes, but instead I just used those libraries directly from the scraper class. At first I felt bad about it, but then I came to the conclusion that this might actually be sensible.

One advice I often read was to wait for several similar use cases before abstracting an interface. I was always mildy confused as to how this related to interface discovery. At this point in my sample application, I concluded that interface abstraction is the way to go when using an existing class, often in the form of an external library. Interface discovery, on the other hand, was the way to go when creating new domain model classes.

This is in line with one reason for creating library interfaces, which is to encapsulate library usage in a single class. If my website scraper is the only class using, e.g. httpclient, its usage is limited to one class. Therefor, I don’t gain much by putting an interface in front of it. Once I have two or three users of httpclient, I see how my application actually uses the library and I can put an interface in front of it.

Conclusion

It took me quite some time to wrap my head around TDD. Thinking in the problem domain had the most impact on me. Once I realised that, the other pieces fell into place.

What do you think? Was this all out there and I just didn’t see it? Did I misunderstand anything? Share your ideas in the comments below!

Reducing the Noise in Git Diffs

| Comments

Since we switched our Rails applications to SQL schemas, I found it disturbing seeing lines and lines of boring autoincrement changes in diffs whenever something touched the database:

-) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
+) ENGINE=InnoDB AUTO_INCREMENT=12833 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

This made it especially hard to identify unexpected changes.

When lamenting about this, a friend suggested git attributes. They allow you to set special options on a per path basis. One such option is a filter that can be applied to files before they get diffed.

In my case, I want to filter those AUTO_INCREMENT lines from db/structure.sql so the first thing is to specify an attribute for this path:

$ cat .git/info/attributes
structure.sql diff=sql_schema

There are different ways to specify attributes. I chose this file so my change wouldn’t interfere with my colleague’s setup.

The next step is to configure the filter to use for this attribute:

$ tail -2 .git/config
[diff "sql_schema"]
        textconv = sed -e '/^) ENGINE=InnoDB/s/AUTO_INCREMENT=[0-9]* //'

This filter selects all lines starting with ) ENGINE=InnoDB and removes the AUTO_INCREMENT statement. Now, whenever the schema changes, I only see the actual change, without any noise from my auto increment counters.

If you work with multiple Rails applications, it might make sense to move the sed call to a script and call that instead.

Hope this helps.

A Critical Review of Practical Object-Oriented Design in Ruby (Chapters 1 to 3)

| Comments

Sandi Metz’ Practical Object-Oriented Design in Ruby is praised as one of the best books on object-oriented design for ruby developers (15 5-star reviews on amazon as of this writing). When I started reading it, however, I was quickly irritated by some of her advice. I want to take the time and focus on these controversial topics, in the hopes of learning what I might have missed or why it all actually makes sense.

This is not a slating review, however. I enjoy reading the book and found some helpful information in the first chapters. It is certainly worth its money.

Chapter 2: Creating Classes that have a Single Responsibility

If the responsibilities are so coupled that you cannot use just the behaviour you need, you could duplicate the code of interest. This is a terrible idea.

Although this advice makes sense on first reading, I don’t agree with its tone. I believe in the importance of context for decision making. IMHO, it often makes sense to tolerate a certain amount of code duplication in order for better abstractions to emerge. Abstracting prematurely, might lead to the necessity of rewriting the abstraction later, which is explictly what good design set out to prevent (see chapter 1).

Hide Instance Variables

In this section, Sandi argues for hiding all instance variables behind getters, even for class internal use. This would simplify future code changes if you ever need a derivative of an instance variable instead of the variable itself.

I totally don’t get her argument. A class’ code is usually kept in a single file, so changing an instance variable to a method call is trivial, even for plain text editors. Wrapping each instance variable in a getter looks more like over-engineering to me.

Furthermore, an instance variable gives a name to a value. When I need a derivative of that value, I would expect for the name needing to change as well.

I prefer keeping my code as simple as possible, using instance variables for simple data and extracting emerging concepts (derivative values) into properly named helper methods through refactoring.

Overall, I perceive chapter 2 as contradictory. On the one hand, Sandi argues for deferring decisions (e.g. on page 32: Any decision you make in advance of an explicit requirement is just a guess. Don’t decide; preserve your ability to make a decision later.) but on the other hand I hear her arguing for writing code in anticipation of future requirements.

Chapter 3: Managing Dependencies

Isolate Vulnerable External Messages

In this section, Sandi tries to reduce coupling to an external interface, by extracting a method call to an external object into its own method.

While I do see some value in this technique, I don’t buy her argument. In a big class with many calls to the same external method with many arguments, it might make sense to extract that method call, but I consider having such code bad design that needs to be fixed. I aim for small classes with small interfaces. In case of interface changes, search and replace is simple enough.

To stay with her example, I even consider extracting wheel.diameter bad design, because it hides the information of which diameter we are dealing with. It reduces legibility.

Use Hashes for Initialization Arguments

In order to reduce coupling to method argument order, Sandi proposes the use of argument hashes.

Initializing, for instance, a value object with an attributes hash makes perfect sense to me. However, I cannot follow her in defaulting to argument hashes for all methods. I am with Robert Martin, who basically said less is better. If I have a method with more than 2 arguments, the method smells. The solution in this case is to refactor the interface, so we get by with one or two arguments, in which case the order isn’t much of a problem.

While Sandi describes most problems of her approach, which gives the alert reader a hint that using argument hashes might be more problematic than it looks like at first, she misses the important point of key validation. If you use argument hashes and mistype a key, no error will be thrown.

GearWrapper isolates all knowledge of the external interface in one place and, equally importantly, it provides an improved interface for your application.

I couldn’t agree more with the idea of defining an application specific interface on top of an external class to decouple code. I was irritated, however, when I read Sandi just wrapping object instantiation and justifying it with the use of an argument hash. Why would you stop there and not move on to implementing a complete interface? Why just wrap the initializer? That doesn’t make any sense to me and is another case of seeming self-contradiction.

That’s it for now. If I find more controversial recommendations in the upcomming chapters, I’ll collect them in another blog post. As I don’t consider myself an expert developer, I might have missed some important aspects that shine a different light on Sandi’s recommendations. Feel free to enlighten me in the comments! :)

A Terminal Setup for Complicated Development Environments

| Comments

When Rails applications grow, they accumulate more and more dependencies. This not only increases load time, when you integrate additional services like elasticsearch, it becomes ever more complicated to get a local server or test suite running.

When reviewing diffs and running the test suite, I often forgot to start a service and had to rerun the suite after failing half way through. So I finally sat down and configured some tools to ease my developer life.

My base setup

I prefer developing with vim on the command line and use a single full-screen terminal running tmux. Before, I started a rails server on tab 0, ran vim in tab 1 and added tabs dynamically as needed.

Enter the search

When we added a local search engine to one of our Rails applications, life became complicated. Suddenly I had to remember starting the search server before testing the application. As I only worked on it every now and then, I often forgot to start the search service, but still I coped.

When we added a local search engine to another of our Rails applications, life became tedious and I had to change something. The solution was to introduce foreman. It lets you define all the services you need in a Procfile, so you only have to run a single command foreman start and everything is up and running.

The new setup was foreman start in tab 0 and vim in tab 1.

Automating startup

I still had to remember starting foreman manually, so the next step was to automate that. For this task I chose tmuxinator which allows you to start pre-configured tmux sessions with a single command. I settled with starting foreman, a rails console and a shell, running git fetch origin, which I would later use for starting vim. The neat thing about running git instead of starting vim directly is, that I get an idea about recent changes right at the start.

I said earlier, that my terminal already uses tmux as a shell, so I needed a way to circumvent this default behaviour and start tmuxinator instead. This isn’t much of a problem, if tmuxinator is installed somewhere in your $PATH. Using rvm, however, tmuxinator is only available after rvm initialization. To handle this, I settled on this workaround:

  • start a shell script from the window manager named after the project, e.g. cms
1
2
#!/bin/bash
roxterm -T cms -e mux-cms
  • mux-cms then starts tmuxinator
1
2
#!/home/sturm/.rvm/bin/rvm-shell
mux start cms

Adding bells and whistles

One thing that had bothered me for a long time, that I finally decided to attack as well, was the slow rails load time. For one of our applications, it takes about 20s to load all the gems. I had previous experience with spork to speed up test load times, but I also wanted to speed up rake and service start-up times. So I decided to give zeus a try.

Zeus’ big advantage over spork is, that you don’t have to change anything in your application, but instead commands are zeusified. Instead of rails console, you run zeus console. Zeus works only with Rails 3 applications running on Ruby 1.9.3. Lucky for me, we upgraded our last application to Ruby 1.9.3 a few weeks earlier.

Another requirement stated on zeus’ website is a GC-patched Ruby. So far, I haven’t needed it at all. Plain Ruby 1.9.3 works fine for me.

What doesn’t work perfectly, yet, is orchestrated startup with zeus. The zeus server obviously takes quite long to start and won’t be accessible for clients during that time. The only answer I came up with so far, was adding sleep to process startups in my tmuxinator config.

After reducing startup times considerably, it finally made sense to integrate guard and guard-rspec. I run these split pane on the third tab with my editor, having one source/spec combination open per vim tab.

coding tab example

Conclusion

The command line is a powerful base and there are many tools to improve a developer’s life. Can you think of further improvements? Let us know in the comments!