Skip to main content

This one time when I blew up production

· 4 min read
Puneet Khanduri

Sometimes our clients invite us to speak to their engineering teams about software engineering best practices.

There is plenty of published content on this broad subject so it was initially very challenging for us to make the experience somehow distinctly valuable for our intended audience. We asked ourselves what unique nuggets of wisdom do we have that people will not be able to find elsewhere.

The honest answer was — None.

Yet, we came up with something that allowed our clients to gain valuable insights while improving the health of their engineering culture.

We call it the “This one time when I blew up production” workshop.

The core idea behind this workshop is to share engineering war stories curated from our own experience and encouraging the client’s engineering teams to share theirs. Despite an abundance of examples of production disasters in real life, there is an acute shortage of published content on the subject. The reason is simple, people love to tell stories about their success but telling stories about your failures is hard.

Advertising engineering success stories is very common — it helps you attract good talent by building a strong brand for your engineering culture. But how often do companies blog about massive engineering disasters? Talking about your failures — at least internally — sends out a strong message to your team that failures are part of life and your fear of failure should not reduce your appetite for risk.

As an example, we have heard this same confession many times from many rapid growth companies:

There is this ugly piece of legacy code that no one understands anymore. We don’t want to touch it because we are afraid we might break something.

Fear is good — to the limit that it checks recklessness. It becomes a problem when it starts paralyzing you.

So how does “This one time when I blew up production” help?

It allows your best engineering leaders and individual rockstars to step forward and share their war stories and battle scars. They get to talk about how they once failed miserably and what they learned from that experience. From the audience perspective, these personal stories are way more relevant and relatable any other published material.

But pulling this off is a lot easier said than done. It takes a lot of courage on part of the speaker to do this — you are exposing yourself to the judgement of your peers. You risk losing their respect and admiration.

As you try to assess this unconventional speaking opportunity, all your insecurities suddenly amalgamate into a very simple question:

Why should I do it? What’s in it for me?

The only rational motivation to go through with this risk is that you are willing to sacrifice some admiration from your team to gain something far more valuable — their trust.

People trust you more when you openly talk about your failures. They feel safe around you knowing that you won’t judge them. Ultimately, it is this trust and sense of safety that gives them the courage to be more ambitious and take on more risk.

The format of the workshop starts off with a case study like the ones typically done in MBA programs at business schools but the context is engineering war stories rather than business strategy. It feels a little bit like the Kobayashi Maru simulation from star trek except you are dealing with services, storage, monitoring tools, and client-side apps instead of spaceships and photon torpedos. This format then progresses on to war stories from individuals in the host organization. This is the most intimate part where engineers candidly share their stories. Finally, the workshop is wrapped up with comments, questions, and learnings on all the stories shared by everyone. The only contribution we make in this process is to guide and facilitate the experience — because the bulk of the content comes from the audience itself.

The key observation leveraged in these discussions is that the most valuable things you can learn are from the experiences of others around you. The reason why this works so well is because these people are easily accessible to you and you can relate to them. This is very different from reading a book or a blogpost where the consumption format for the same lessons is very different.