Skip to main content

· 4 min read
Puneet Khanduri

A few days ago, we annouced a major update to Diffy and briefly mentioned the introduction of a powerful Request Transformation feature. Before we dive in, here's a bit of context on why we need different kinds of request transformation.

As your APIs evolve, you inevitably introduce breaking changes that make old requests (served by older versions of your APIs) incompatible with newer versions of your APIs. If the new API is genuinely doing something different than the old API then the old requests are no longer useful for any kind of testing and you can safely ignore them.

That said, it is truly unfortunate that you can't use old requests to test newer versions of an API when the underlying purpose of the API hasn't changed. You still want to know that the new API behaves like the old API - only now you need to "touch up" your old requests to turn them into new requests.

Here's different kinds of touch ups you can perform with Diffy 2.0:

Header Transformation

Many of customers mirror production traffic to Diffy to catch API regressions in their services. Replaying production traffic in test environments can lead to some problems like auth tokens not working in test environments - you need to override the headers in order for the traffic to be consumable in your test environment. e.g.

request => {
request.headers['auth-token'] = 'test-environment-token'

URI Transformation

Sometimes, the new code you want to test is expected to have the same functionality as the old code with a modified URI structure. Modified URI structures in the new code mean that old URI structures sampled from production environments will lead to 404 errors. In order to successfully replay and verify the expected behavior of the new code, you need to transform the old URI structures to new URI structures before sending the request to your new code.

request => {
// transform /api/1/* to /api/2/* e.g.
// /api/1/products/3465 to /api/2/products/3465

request.uri = request.uri.replace('/api/1/','/api/2/')

Body Transformation

Making schema changes to your APIs involves mapping old fields in your old code to new fields in your new code. This situation requires for you to rewrite the body of requests with your old schema to new your schema so that the request becomes consumable to your new code. With this necessary transformation, your new code will not be able consume production traffic.

request => {
// We have just refactored our code such that product objects that were previously consumed as
// { name: string, image_url: string }
// are now consumed as
// { id: string, imageUrl: string }
// i.e. the field 'name' has been renamed to 'id' and
// 'image_url' has been renamed to 'imageUrl' =
request.body.imageUrl = request.body.image_url
delete request.body.image_url

Bonus - Advanced Security and PII Scrubbing

Your production traffic contains sensitive data that can not be exposed to your test environments. This requires the requests to be scrubbed before being sent to your test targets.

request => {
// We want to redact an email deep inside the body of the request = ''

The above examples are simplified pseudocode for readability but the key takeawaye is that you can implement pretty much any transformation you can imagine to your request given the power of a scripting language at your fingertips.

Request Transformation

With Diffy, you can now inject request transformation logic in 4 places:

All - these transformation will be applied to requests being sent to candidate, primary and secondary.

Candidate - these transformation will only be applied to the requests received by candidate.

Primary - these transformation will only be applied to the requests received by primary.

Secondary - these transformation will only be applied to the requests received by secondary.

You can pick the places that you want to inject your transformation via the UI and then write and save the transformation:

Request Transformation

That's it! You can now send traffic to your Diffy instance and Diffy will apply your transformations to the requests before sending them off to the targets.

If you have any follow up questions please feel free to reach out to us via our Discord server or at

· 3 min read
Puneet Khanduri


Sn126 is proud to announce a major update to Diffy. But a quick history recap on how we got here.

Diffy was originally built at Twitter in 2014. That's almost 8 years ago! Just let that sink in. There were many technology decisions that were simply the result of being born at Twitter. e.g.

  1. Diffy was written in Scala.
  2. It used Finatra for API design and Finagle for networking.
  3. It was built to test Thrift APIs before it was generalized to test HTTP APIs.
  4. Twitter had just introduced Manhattan - Twitter's proprietary key-value store of choice. Any new services being built were using Manhattan.

Many of these technology choices were innately great. The core difference engine written in Scala could not and still can't be written as concisely in Java. Finagle offered a beautiful abstraction for networking that made it possible to implement and merge http support in a matter of days.

After open sourcing Diffy, we began receiving support requests from other engineering teams. We began to understand the world outside had much more diverse requirements. This eventually lead to the birth of Sn126.

To meet the new set of requirements from our customers, we decided to build a more advanced version of Diffy called Isotope. This was the time when the tech choices we had made at Twitter started turning into tech debt. Scala and Finagle could not catchup to Java and Spring Boot. Scala's build story never got any better and simple tasks like updating our Finagle dependencies ended up becoming time consuming work.

Meanwhile, most of our customers were using these other technologies along with Nodejs and Python. As we worked closely with their engineering teams, we experienced their best practices and started adopting them ourselves. Every time we would ship minor changes to Diffy, we would ponder what it would be like to revamp it with the most cutting edge tech.

Finally, the last straw was the uncertainty around Twitter acquisition and the damage it did to the stability of its open source technologies. With my former colleagues not being sure about having a job in a few months, how could I be sure about the maintenance and support of their open source code at Twitter.

As the case went to trial, we started working on all the changes we wanted to free ourselves from our accumulated tech debt and rewrite significant chunks of Diffy.

  1. We swapped out Finagle with Spring Boot and introduced significant new chunks of Java.
  2. We added MongoDB integration to eliminate all the memory pressure caused by keeping all the difference results in-memory.

Why did Diffy keep everything in-memory? Remember the story about Manahattan from earlier ... if Diffy were to have any storage, it would have to be Manhattan ... and you can't open-source something with a proprietary dependency ... so we decided not to have any storage at all!

  1. We rebuilt the UI from scratch using Typescript, React and implemented material design.

Coincidentally, by the time the Twitter-Musk deal closed on October 28th, we had already removed every Twitter open-source dependency from the code base.

There are many great features we have built in Isotope that we can now introduce to Diffy. We have already added request transformation capabilities that allow production requests to be sanitized and rewritten before being multicast to test targets. We will talk about that and others in the next post.

In the meantime, feel free to take the new Diffy out for a spin and do share your feedback. We are

· 3 min read
Puneet Khanduri


The most common reason (if not the only) for alerts is deploying new code to production APIs.

It’s always the new code that makes all the noise and wakes you up in the middle of the night — almost like a new born baby. By contrast, old code sits quietly in production minding its own business. If your production environment has been humming like a peaceful symphony for the past week, the chance that it will start throwing alerts in the next hour are slim to none.

So how do you make sure that your new code doesn’t break your production symphony?

If you are familiar with Diffy, you already know that comparing the behavior of your new code to your old code against production traffic is the best guarantee you can ask for.

Over the years, Diffy has helped many large tech-heavy engineering teams achieve their quality aspirations. While Diffy users continue to get tremendous value from Diffy’s UI, our larger enterprise customers have been seeking better integration with their existing observability stack.

We did this custom integration for some customers but always wanted something like OpenTelemetry to come along so we would no longer have to build and maintain custom integrations.

There are 3 aspects to observability and this is how Diffy leverages all of them via OpenTelemetry:


Diffy logs include important lifecycle events as well as traffic events for every sampled request sent to Diffy. These can now be seamlessly ingested into your favorite sink.



One of the most valuable things Diffy does is shred your traffic schema to analyze and aggregate differences observed at various nodes. This creates an interesting opportunity for you to receive schema-based metrics from Diffy.



Lastly, Diffy now exports traces to your favorite tracing backend for you to inspect all sorts of interesting metadata. e.g. The Diffy UI limits the number examples it shows to carefully manage storage constraints — you can now observe the request and all 3 responses within the traces.


The logs, metrics, and traces currently exported by Diffy are based on a survey of our existing users. Please feel free to open an issue and let us know if there is anything else you need in any of these dimensions.

Try it out! Checkout our Advanced Deployment guide to explore these cool dashboards for yourself. If you have any questions, please feel free to reach out to us for support at

· 4 min read
Puneet Khanduri

Space Time Bending space time for the greater good of software quality

Agile workflows require a fast cadence of releases trying to minimize the time between inception of features in the product manager’s mind to when they are in the customers hands. As engineers are encouraged to move fast and break things, quality suffers as many features that used to work earlier are suddenly found to be broken — or worst — broken but not found.

Based on our measurements across all the services using Isotope, we have observed that typical backend services have between 15,000 and 25,000 different scenarios that they will see in production. Of these thousands of scenarios, typically, only 10 to 20 will be covered via integration tests and even these suffer from flakiness. While some “core flows” are covered either through these tests or manually, the vast majority of scenarios are never tested. Eventually, bugs are discovered by customers, reported to customer success agents, escalated to product managers, and eventually triaged by engineers who then ask for more details as they have a difficult time reproducing the bug.

Covering a handful of scenarios with tests is like a COVID19 healthcare worker wearing just a face mask due to shortage of PPE.

So how do we test all the scenarios we didn’t write tests for?

We clone a parallel universe and deploy our new code to production in the cloned universe. The cloned universe runs all the production scenarios on our new code. If the behavior of our new code looks similar enough to our old code then we are all good and we can deploy the new code in our own universe. If the new code blows up or shows any weird harmful behaviors, then no harm is done to our universe. In either case, we can now discard the cloned universe and create more clones as needed.

The analogy of bending the laws of space-time can be translated to our context with a few key components:

  1. Cloning the universe — Recording traffic from our production environment and replaying it in our test environment. This requires adding an invisible observer to your production environment which can also passively replay the sampled traffic to your test environment.
  2. Comparing the behaviors — You need a difference engine that can understand the schema of your requests and responses and isolate the fields where differences are showing up.
  3. Aggregating comparisons — You need to aggregate the differences observed across thousands of scenarios and group them into buckets (probably) representing the same bug.
  4. De-noising — Some amount of non-determinism in our business logic is inevitable. Typical sources of non-determinism are dependence on time, random numbers, random strings, and fast changing data that lives outside your service. This non-determinism can lead to a lot of false positives and you quickly stop trusting the tool that cries wolf too often. Check this post out a deeper discussion around the quality of Application Quality Monitoring (AQM) systems. To maintain this meta-quality, you need a smart way to de-noise your AQM.

At Sn126, we have built these capabilities into Isotope to help teams increase the quality of their software while reducing their burden of investment in written or manual tests. Isotope also goes a few steps further by enabling integration hooks for your existing CI/CT/CD systems. This allows our users to let Isotope monitor their release candidates and receive slack notifications when Isotope catches a bug.

If you would like to try Isotope or would like to see a demo, please don’t hesitate to reach out to us. We look forward to hearing from you.

· 3 min read
Puneet Khanduri
Evaluating Quality Engineering investments and measuring the quality of Application Quality Monitoring systems

Let’s think of some questions we can use to determine how good a job your security guard has been doing.

How many burglars did you catch last quarter?

Obviously, this is not a good way to measure your security guard’s performance. Developers sometimes ask variations of this question like “how many bugs will this tool catch?”, or “how long do I have to wait before it catches one?”, and “how long do I have to wait before that Aha! moment?”

A well-meaning QA Engineer will likely respond with “I hope you never have that Aha! moment” because when it comes to testing and security, boring is good. In order for any quality-related investment to demonstrate ROI (by these standards), the developers really have to screw up often.

How many thefts happened while you were on duty?

This is a much better criteria and sets a lower bound on performance expectations. Good QA Engineers aspire to prevent all bugs from ever reaching prod environments. Good metrics around this criteria are “how many bugs shipped to production last sprint?” or better yet “how many sprints since last bug was shipped to production?”

How many legit visitors did you mistake for burglars?

A paranoid guard who mistakes your regular electrician and plumber for burglars and calls the cops is being sincere but is a bit painful to deal with. He is like the boy who cried wolf — eventually when there is an actual burglar, you won’t believe him. Tools used by QA Engineers can sometimes be noisy and produce false positives. These noisy reports and flaky tests erode trust and teams quickly learn to tune them out.

We are really talking about using precision and recall as metrics for evaluation of your QA investments. Just to be clear, we are not talking about code coverage here (why not is an entire separate post). Most teams don’t apply this level of mathematical rigor to evaluate their investments. Eventually, when you start doing this you realize two things:

  1. You have some blind spots in your investments that make it impossible to catch some classes of bugs.

  2. You also have some redundancy in your investments where the same class of bugs will be caught by multiple investments independently.

Applying quantitative principles to your quality engineering investments allows to discover optimal tradeoffs and maximize your ROI.

Stay tuned for a detailed case study!

· 3 min read
Puneet Khanduri


Imagine watching a child fall through thin ice with no one else in sight. There are three basic reactions you can have:

  1. You realize the ice is thin and losing your life will not save the child. You do nothing. This reaction does not require courage just pragmatism.

  2. You assume the ice is thick and run towards the child thinking nothing will happen to you. This reaction also does not require courage just stupidity.

  3. You realize there is a very good chance you will die yet you run towards the child. This is true courage.

The Corona virus is a real threat that has forced many large companies to lose velocity as employees are being asked to work remotely and travel is getting restricted. This is a great opportunity for small startups looking to disrupt large incumbents. Here are two big reasons:

  1. The risk for small teams working in small isolated spaces is much smaller than a large number of people used to working together in large buildings. You can use this to take some calculated risk and maintain your velocity while larger companies slow down.
  2. Take some more calculated risk and travel to get in front of customers. The ones willing to see you are being equally courageous and are, therefore, highly qualified leads that will shorten your sales cycle.

Taking calculated risk means understanding the odds and accepting the worst case scenario as a real outcome. It also means having a solid grasp of risk profile.

i.e. There is a real chance that you may get infected with the virus but what are the odds? There is a real chance that you could die because of the virus but what are the odds? How does that risk compare to getting hit by a truck while crossing the road? How many busy roads would you be willing to cross on a regular day to work or to a client meeting?

This thought exercise should help you assess if the virus threat fits within your risk profile. As always, you won’t find perfect information so you will end up informing your intuition and, ultimately, making a decision with your gut. Try not to think about what everybody else is doing as you make your own decision on how to react to the virus threat.

In my personal view, as a founder you don’t really have a choice because the child struggling to come out of the icy water is your own.

Credit: The example used to demonstrate courage is borrowed from Ivan Sutherland’s famous 1982 essay titled “Technology & Courage”.

· 4 min read
Puneet Khanduri

Sometimes our clients invite us to speak to their engineering teams about software engineering best practices.

There is plenty of published content on this broad subject so it was initially very challenging for us to make the experience somehow distinctly valuable for our intended audience. We asked ourselves what unique nuggets of wisdom do we have that people will not be able to find elsewhere.

The honest answer was — None.

Yet, we came up with something that allowed our clients to gain valuable insights while improving the health of their engineering culture.

We call it the “This one time when I blew up production” workshop.

The core idea behind this workshop is to share engineering war stories curated from our own experience and encouraging the client’s engineering teams to share theirs. Despite an abundance of examples of production disasters in real life, there is an acute shortage of published content on the subject. The reason is simple, people love to tell stories about their success but telling stories about your failures is hard.

Advertising engineering success stories is very common — it helps you attract good talent by building a strong brand for your engineering culture. But how often do companies blog about massive engineering disasters? Talking about your failures — at least internally — sends out a strong message to your team that failures are part of life and your fear of failure should not reduce your appetite for risk.

As an example, we have heard this same confession many times from many rapid growth companies:

There is this ugly piece of legacy code that no one understands anymore. We don’t want to touch it because we are afraid we might break something.

Fear is good — to the limit that it checks recklessness. It becomes a problem when it starts paralyzing you.

So how does “This one time when I blew up production” help?

It allows your best engineering leaders and individual rockstars to step forward and share their war stories and battle scars. They get to talk about how they once failed miserably and what they learned from that experience. From the audience perspective, these personal stories are way more relevant and relatable any other published material.

But pulling this off is a lot easier said than done. It takes a lot of courage on part of the speaker to do this — you are exposing yourself to the judgement of your peers. You risk losing their respect and admiration.

As you try to assess this unconventional speaking opportunity, all your insecurities suddenly amalgamate into a very simple question:

Why should I do it? What’s in it for me?

The only rational motivation to go through with this risk is that you are willing to sacrifice some admiration from your team to gain something far more valuable — their trust.

People trust you more when you openly talk about your failures. They feel safe around you knowing that you won’t judge them. Ultimately, it is this trust and sense of safety that gives them the courage to be more ambitious and take on more risk.

The format of the workshop starts off with a case study like the ones typically done in MBA programs at business schools but the context is engineering war stories rather than business strategy. It feels a little bit like the Kobayashi Maru simulation from star trek except you are dealing with services, storage, monitoring tools, and client-side apps instead of spaceships and photon torpedos. This format then progresses on to war stories from individuals in the host organization. This is the most intimate part where engineers candidly share their stories. Finally, the workshop is wrapped up with comments, questions, and learnings on all the stories shared by everyone. The only contribution we make in this process is to guide and facilitate the experience — because the bulk of the content comes from the audience itself.

The key observation leveraged in these discussions is that the most valuable things you can learn are from the experiences of others around you. The reason why this works so well is because these people are easily accessible to you and you can relate to them. This is very different from reading a book or a blogpost where the consumption format for the same lessons is very different.

· 3 min read
Puneet Khanduri

Crying because of testing

Testing is hard. But why?

As developers we like to think of ideas and translate them into code. This process of turning thought into reality is a very empowering experience and forms a strong positive feedback loop that helps us become faster, more concise, stylish, and even more elegant in how we write our code.

The problem with testing, or writing tests, is that it forces us to imagine all the ways current or future versions of our code may be wrong. The more exhaustive we want our test suite to be — the more we have to challenge ourselves and explicitly state all assumptions made in our code. The experience feels like creating and then facing all your worst nightmares.

Without a good amount of masochism in your DNA, this doesn't come naturally to you.

In the best case, at the end of having spent twice the time it took us to write the code being tested, all we have to show is that our code meets expectations. In terms of grading, passing all your tests amounts to a “satisfactory”. There is is no way to get an “outstanding” or “excellent” in the testing game. In most cases though, we find that we screwed up somewhere — so the test really pays for itself when it tells us that we are wrong!

The emotional cost of thinking like a paranoid and the lack of reward make this entire exercise a negative feedback loop. We then want to outsource this thankless burden to anyone willing to take it from us. We thus end up creating armies of manual QAs and Test Engineers responsible for verifying that our implementation conforms to the product specification. In someways, this is like relying on a third party auditor to do our accounting, but hey, as long as the auditor doesn't complain or charge too much, we are happy.

At this point, if you are thinking of Test Driven Development, you should ask yourself why you have never actually used it despite having heard of it so many times and perhaps even pretending in an interview that TDD was your style.

Again, the positive feedback loop of translating thought to reality is very addictive and you will have and easier time quitting cocaine than changing your coding style (ok, maybe not, but you get the point).

A more pragmatic approach to dealing with Testing is to invest in tools that reduce or eliminate this burden away. Writing tools is extremely rewarding because tools are products themselves. We end up leveraging the positive feedback loop to create elegant solutions that do no rely on others doing this work for us. They give us the feedback we need without making us spend hours. Most of all, these tools we build are generic enough to work across projects so the investment cost is better amortized than any project-specific testing effort.

A small caveat to the beautiful Tools story is that tools themselves also need to be tested — but that's for another post.