Building Blocks | January 2023

4 min read

In this edition

Twitter’s microservices a bigger disaster than Musk’s new policies?
Is Chaos Engineering beyond turning off instances?
Kluctl philosophy: Live and let live | AIOps: Journey ahead

Ho! Ho! Ho! Christmas is almost here and it’s time to bid adieu to the year 2022.

Building Blocks’ last edition for the year sheds light on Elon Musk’s tweet about blaming microservices for Twitter being slow in some countries and the future state of chaos engineering. Plus, some recommendations for the weekend.

Happy reading!

IN FOCUS

Twitter’s microservices a bigger disaster than Musk’s new policies?

In a recent tweet, Musk claimed that Twitter’s refresh speeds in regions like India and Indonesia are five to ten times slower than in the US. But are microservices to blame?

So what exactly is going on here? Who’s right? The new billionaire owner or the developer who has or should we say, had been handling the Twitter stack!

"The three reasons for the app to be slow are – First, it’s bloated with features that get little usage. Second, we have accumulated years of tech debt by trading velocity and features over performance. Third, we spend a lot of time waiting for the network responses”

Well if a bunch of other Twitter engineers are to be believed, Musk did get this one wrong. In a later thread of tweets, Eric identified what was the main reason behind the slow performance of the Twitter app on Android!

Here’s what Joe Beda has to say about how mature distributed system works:

So is Musk entirely wrong? Well, back in 2021, Twitter’s Senior Staff Engineer, Steve Consenza, said that the proliferation of too many microservices had made the entire Twitter API disjointed and cumbersome. So, yes microservices sometimes can hurt performance, just not in this case!

IN ACTION

Is Chaos Engineering beyond turning off instances?

Chaos engineering has worked wonders at Microsoft, Amazon, and Netflix. But is it something that would be useful for you and your organisation? And, is there more to Chaos engineering than turning off instances?

Chaos engineering was devised to understand and navigate complex systems (constellations of different database servers, APIs, microservices, and libraries) – helping organisations understand how their system works and check their resilience.

"Chaos engineering is about injecting a controlled and well understood failure into the system, while controlling as many other variables as possible, to confirm that the system reacts in the way that we’re expecting it to.”

In January, Gremlin released the results of a survey tracking how organisations are adopting chaos engineering and the business value it is adding. Companies that frequently run chaos engineering experiments have > 99.9% availability, and most respondents (60%) have run at least one chaos engineering attack.

Nora Jones, founder and CEO at Jeli, says teams need to understand when and where to experiment. She helped implement the Chaos Automated Platform while still at Netflix. According to her, creating chaos in a random part of the system is not going to be that useful for anyone. There needs to be some sort of reasoning behind it.

Harpreet Singh, co-founder and CTO at Watermelon Software Inc, shares the chaos engineering journey of DBS bank and how it dispelled the myths of chaos engineering.

Amazon has been using chaos engineering practices for a long time. Last year, Werner Vogels, VP and CTO at Amazon, introduced the company’s chaos engineering as a service offering called AWS Fault Injection Simulator.

Chaos engineering has come a long way from the question – “Why would you want to do that?” to help ensure the reliability of the top companies worldwide. Maria Korolov, in her blog, has shared insights about the strategic thinking required for implementing chaos engineering and driving system resiliency.