Skip to content

Entries tagged "lean thinking".

The Toyota Way

My motivation to start this blog stemmed from the desire to question and discuss the applicability of agile and lean methods in system administration. To start this discussion, in the upcomming weeks I will go through the 14 principles that form The Toyota Way and discuss their merit to the field of system administration.

These principles are:

  1. Base your management decisions on a long-term philosophy, even at the expense of short-term financial goals.
  2. Create a continuous process flow to bring problems to the surface.
  3. Use "pull" systems to avoid overproduction (kanban).
  4. Level out the workload (heijunka).
  5. Build a culture of stopping to fix problems, to get quality right the first time (andon).
  6. Standardized tasks and processes are the foundation for continuous improvement (kaizen) and employee empowerment.
  7. Use visual control so no problems are hidden (jidoka).
  8. Use only reliable, thoroughly tested technology that serves your people and processes.
  9. Grow leaders who thoroughly understand the work, live the philosophy, and teach it to others.
  10. Develop exceptional people and teams who follow your company's philosophy.
  11. Respect your extended network of partners and suppliers by challenging them and helping them improve.
  12. Go and see for yourself to thoroughly understand the situation (genchi genbutsu).
  13. Make decisions slowly by consensus, thoroughly considering all options; implement decisions rapidly (nemawashi).
  14. Become a learning organization through relentless reflection (hansei) and continuous improvement (kaizen).

Beware that this series of posts is not an introduction to the Toyota Way, but discusses its applicability to the field of system administration. I strongly suggest reading Jeffrey Liker's The Toyota Way.

The Toyota Way: Principle 1

Base your management decisions on a long-term philosophy, even at the expense of short-term financial goals.

A company needs to make money and system administrators need to keep their services up and running. This is a necessary foundation for successful work but it is far from sufficient. In order to be truly successful, you need to find a purpose that goes beyond banalities. A purpose that helps you and your people motivate yourselves to reach for greater goals. Once this philosophy is found and established, stick to it! From time to time it might cost you, but over the long run, it certainly is worth it.

In my case this long-term philosophy or basic motto underlying my work is committed to excellence. It means not accepting band-aids but always striving for sustainable solutions. This costs some time in the short-term when solving complex problems or implementing new services. But in the long-term we gain a lot of clarity and simplicity for our operators and achieve a high degree of availability for our users.

The Toyota Way: Principle 2

Create a continuous process flow to bring problems to the surface.

While the first principle was generic enough to apply to many different situations, this second principle is not as easy to apply to system administration. So what is the idea behind it?

In a production environment, according to this principle, pieces of work should flow through the factory, being handed from one worker to the next without interruption. The goal is to link worker's processes. This way unproductive idle time (from the point of view of the workpiece) is minimized and eventual problems surface immediately, meaning that they interrupt the flow, so that everyone understands they need to be fixed. How can we translate this idea to system administration?

One difficulty is, that in a factory, many identical workpieces are created by following one and the same process over and over. In system administration, we seldomly have this situation, examples would include deployment of desktop machines or adding a new user to your system. Often we have to deal with more or less new situations like deploying a new service or dealing with an unknown problem for our users. What simplifies our situation compared to a factory is, that we often work solo on a task and don't have to link up with someone else. Therefor we can deal with all these situations in the same way. Once you commit yourself to a task, work on it until it is done or can be handed off according to process flow. This minimizes work in progress and task switching overhead. When you recognize an infrastructure problem on your way, stop and properly deal with it, if it relates to you current task. Otherwise just open a ticket for it, to not get distracted from your objective.

To sum it up, working in a continuous process flow translates to singletasking until done.

Kanban in Operations - The Idea

At DevOpsDays I saw Mattias Skarin's talk about Kanban in Operations. The points he made resonated very well with my own ideas and the problems I faced, when experimenting with Scrum and XP in my admin team. So I decided to give it a try and document my experiences here.

We are 4 admins at work, with myself as team leader and an additional one or two trainees or interns. One admin is dedicated to dealing with user requests, while the rest of us are working on infrastructure projects, each on their own one. We have some overlap though, to permit for holiday replacement.

Our current setup poses a couple of problems I hope to tackle with Kanban:

  • everyone feels like having too many things going in parallel
  • admins only know their parts of the infrastructure
  • many projects take forever to finish
  • it's hard for me to notice problems in delegated projects, as we only do some kind of ad hoc project management
  • urgent mini-projects regularly occur, halting the affected admin's regular projects

Kanban 1.0

In this first post, I want to document the kanban board as I came up with it. I assume that after some discussion with my colleagues and experiences in the wild, the layout will change.

kanban board 1.0

Our group has two sources of work: user requests and projects. For now I want to keep the separation between helpdesk admin and project admins.

User Requests

We will only manage bigger user requests on the kanban board, that take at least an hour of work.

There are two lanes: 1st Level and 2nd Level. The 1st level lane is worked on by a dedicated admin, who will move requests to 2nd level, that need attention from one of us project admins. In the spirit of production has precedence over projects, 2nd level has higher priority than project work.

Open questions:

  • How can we limit 1st level WIP, when requests often enter a wait state?

Projects

On the left, there is a prioritized project backlog called Projects. This is where upper management can freely reshuffle and replace cards. Both, the project backlog and the number of active projects are limited.

For each project in work, the team maintains MMFs in the Backlog. The highest priority MMFs are regularly broken down into individual tasks and moved to the Ready state. From there, tasks move via WIP to Done and finaly the MMF itself is moved to Done. Ready and WIP states are limited.

In case of urgency, management has one urgent project slot. This project has higher priority than any other project being worked on, but less than 2nd level operations.

Improving Collaboration

To improve collaboration, all project admins will work together on the same project(s) and we will experiment with pairing.

The Toyota Way: Principle 3

Use "pull" systems to avoid overproduction.

Overproduction is one form of waste the Toyota Production System tries to overcome. It occurs when more items are produced than can be consumed by customers. In a system of successive processes, each process is the customer of the preceeding process.

In system administration these chains of processes might not necessarily be visible, as often one admin moves with the item from process to process, for instance planning setup of a new service, setting it up, testing it, deploying it to production. Overproduction, however still occurs whenever more work is started than being finished. To overcome it, we have to shift our focus on finishing work by pulling tasks like this:

  • if there is a tested service, deploy it
  • if there is a set up service, test it
  • if there is a planned service, set it up
  • plan a new service

This automatically minimizes our work in process, permitting maximized focus.