Skip to content

Entries from October 2009.

A blog from a medium-sized enterprise admin

My name is Nikolay Sturm and I work as a system administrator for a medium-sized enterprise named GeNUA.

When I look at websites or blogs dealing with system administration, these mostly seem to focus on large enterprises and scalable solutions. Unfortunately my work environment is totally different from that. I work with a small team on only a couple of servers but we have total heterogeneity. Not only differ all our servers from each other, but even the people in my team all have different responsibilities. The only area where scalability and homogeneity is important to us is the desktop, as we administer more than 100 workstations and have to support all those users as well.

Since I first stumbled upon agile methods (XP, later Scrum and Lean Manufacturing), I wondered whether and how these could be applied in my own field. I did some experiments but nothing fundamentally changed the way my team and I worked.

With this blog I will try to analyze the methods that interest me most and see what we as system administrators can learn from software developers, car manufacturers (Toyota Production System) and who knows whom else.

The Toyota Way

My motivation to start this blog stemmed from the desire to question and discuss the applicability of agile and lean methods in system administration. To start this discussion, in the upcomming weeks I will go through the 14 principles that form The Toyota Way and discuss their merit to the field of system administration.

These principles are:

  1. Base your management decisions on a long-term philosophy, even at the expense of short-term financial goals.
  2. Create a continuous process flow to bring problems to the surface.
  3. Use "pull" systems to avoid overproduction (kanban).
  4. Level out the workload (heijunka).
  5. Build a culture of stopping to fix problems, to get quality right the first time (andon).
  6. Standardized tasks and processes are the foundation for continuous improvement (kaizen) and employee empowerment.
  7. Use visual control so no problems are hidden (jidoka).
  8. Use only reliable, thoroughly tested technology that serves your people and processes.
  9. Grow leaders who thoroughly understand the work, live the philosophy, and teach it to others.
  10. Develop exceptional people and teams who follow your company's philosophy.
  11. Respect your extended network of partners and suppliers by challenging them and helping them improve.
  12. Go and see for yourself to thoroughly understand the situation (genchi genbutsu).
  13. Make decisions slowly by consensus, thoroughly considering all options; implement decisions rapidly (nemawashi).
  14. Become a learning organization through relentless reflection (hansei) and continuous improvement (kaizen).

Beware that this series of posts is not an introduction to the Toyota Way, but discusses its applicability to the field of system administration. I strongly suggest reading Jeffrey Liker's The Toyota Way.

The Toyota Way: Principle 1

Base your management decisions on a long-term philosophy, even at the expense of short-term financial goals.

A company needs to make money and system administrators need to keep their services up and running. This is a necessary foundation for successful work but it is far from sufficient. In order to be truly successful, you need to find a purpose that goes beyond banalities. A purpose that helps you and your people motivate yourselves to reach for greater goals. Once this philosophy is found and established, stick to it! From time to time it might cost you, but over the long run, it certainly is worth it.

In my case this long-term philosophy or basic motto underlying my work is committed to excellence. It means not accepting band-aids but always striving for sustainable solutions. This costs some time in the short-term when solving complex problems or implementing new services. But in the long-term we gain a lot of clarity and simplicity for our operators and achieve a high degree of availability for our users.

The Toyota Way: Principle 2

Create a continuous process flow to bring problems to the surface.

While the first principle was generic enough to apply to many different situations, this second principle is not as easy to apply to system administration. So what is the idea behind it?

In a production environment, according to this principle, pieces of work should flow through the factory, being handed from one worker to the next without interruption. The goal is to link worker's processes. This way unproductive idle time (from the point of view of the workpiece) is minimized and eventual problems surface immediately, meaning that they interrupt the flow, so that everyone understands they need to be fixed. How can we translate this idea to system administration?

One difficulty is, that in a factory, many identical workpieces are created by following one and the same process over and over. In system administration, we seldomly have this situation, examples would include deployment of desktop machines or adding a new user to your system. Often we have to deal with more or less new situations like deploying a new service or dealing with an unknown problem for our users. What simplifies our situation compared to a factory is, that we often work solo on a task and don't have to link up with someone else. Therefor we can deal with all these situations in the same way. Once you commit yourself to a task, work on it until it is done or can be handed off according to process flow. This minimizes work in progress and task switching overhead. When you recognize an infrastructure problem on your way, stop and properly deal with it, if it relates to you current task. Otherwise just open a ticket for it, to not get distracted from your objective.

To sum it up, working in a continuous process flow translates to singletasking until done.