Bending space time for the greater good of software quality
Agile workflows require a fast cadence of releases trying to minimize the time between inception of features in the product manager’s mind to when they are in the customers hands. As engineers are encouraged to move fast and break things, quality suffers as many features that used to work earlier are suddenly found to be broken — or worst — broken but not found.
Based on our measurements across all the services using Isotope, we have observed that typical backend services have between 15,000 and 25,000 different scenarios that they will see in production. Of these thousands of scenarios, typically, only 10 to 20 will be covered via integration tests and even these suffer from flakiness. While some “core flows” are covered either through these tests or manually, the vast majority of scenarios are never tested. Eventually, bugs are discovered by customers, reported to customer success agents, escalated to product managers, and eventually triaged by engineers who then ask for more details as they have a difficult time reproducing the bug.
Covering a handful of scenarios with tests is like a COVID19 healthcare worker wearing just a face mask due to shortage of PPE.
So how do we test all the scenarios we didn’t write tests for?
We clone a parallel universe and deploy our new code to production in the cloned universe. The cloned universe runs all the production scenarios on our new code. If the behavior of our new code looks similar enough to our old code then we are all good and we can deploy the new code in our own universe. If the new code blows up or shows any weird harmful behaviors, then no harm is done to our universe. In either case, we can now discard the cloned universe and create more clones as needed.
The analogy of bending the laws of space-time can be translated to our context with a few key components:
- Cloning the universe — Recording traffic from our production environment and replaying it in our test environment. This requires adding an invisible observer to your production environment which can also passively replay the sampled traffic to your test environment.
- Comparing the behaviors — You need a difference engine that can understand the schema of your requests and responses and isolate the fields where differences are showing up.
- Aggregating comparisons — You need to aggregate the differences observed across thousands of scenarios and group them into buckets (probably) representing the same bug.
- De-noising — Some amount of non-determinism in our business logic is inevitable. Typical sources of non-determinism are dependence on time, random numbers, random strings, and fast changing data that lives outside your service. This non-determinism can lead to a lot of false positives and you quickly stop trusting the tool that cries wolf too often. Check this post out a deeper discussion around the quality of Application Quality Monitoring (AQM) systems. To maintain this meta-quality, you need a smart way to de-noise your AQM.
At Sn126, we have built these capabilities into Isotope to help teams increase the quality of their software while reducing their burden of investment in written or manual tests. Isotope also goes a few steps further by enabling integration hooks for your existing CI/CT/CD systems. This allows our users to let Isotope monitor their release candidates and receive slack notifications when Isotope catches a bug.
If you would like to try Isotope or would like to see a demo, please don’t hesitate to reach out to us. We look forward to hearing from you.