Updated: Aug 30, 2020
The consequences of infrastructure as a code
It was the typical heavily raining Seattle day
on my first day at Microsoft Azure. I was full of energy to apply what I learned during my Ph.D. studies. The first six months revealed a different challenge. Teams working on cloud-based infrastructure spend a large chunk of their time automating their operations. But Microsoft Azure was able to keep up because we had then many teams dedicated to automating a lot of needed infrastructure work. After more than three and a half years, I thought of exploring how other companies do that. I decided to join the best of the best in cloud infrastructure.
So, I joined AWS. I worked on its EC2 infrastructure and with teams that are lightyears ahead of where I came from. I thought then, many companies would have the same ability to move fast and run airtight infrastructure like AWS. Lots of automation was in place. But again we had an army of engineers working on infrastructure automation. I was under the impression that with such huge automation, the rest of the industry should be on the path to innovating faster than before.
when I moved to Climate Corp and tried to replicate the model that I saw at Microsoft and AWS. I tried to build as much automation and have a dedicated team for this task. This didn’t work for several reasons. First, although Climate Corp was a medium-sized company, it wasn’t practical to build a large automation team. The company’s core business is in building products for its end users. Second, the product and the business were evolving quite fast. It was always a catch-up game. It caused a lot of frustration.
Observing and experiencing how automation is done from one company to another made me realize that we are heading towards a big problem.
Companies with advanced DevOpos still miss a lot
Climate Corp was one of the earliest AWS customers and considered one of the advanced companies using the cloud infrastructure. I was astonished to see that despite years running cloud-based infrastructure, there are a lot of basics missing. I was under the impression that the infrastructure area was neglected. But I was even more surprised when I saw how much of an investment they did in infrastructure automation before hiring me. A lot of code was written and a lot thought was put into it. I observed the following:
Applied automation became obsolete quite quickly as the business evolved and applications evolved,
The euphoria of building a new system quickly disappeared after engineers realized that they have a ton of infrastructure management code that became very quickly hard to manage legacy,
With all that effort we still missed the basics in security, observability, and basic application lifecycle management,
We didn’t know that we missed all of these until it was quite late to consider in our initial infrastructure automation code.
I and my team at Climate Corp felt overwhelmed. We were able to initially organize ourselves and improve our working habits. But with the emergence of disruptive technologies, such as Kubernetes, we have to go through that again. It was a painful cycle that we had to repeat and adding to the existing complex codebase.
Infrastructure as code is NOT the answer
Engineers currently use conventional programming to manage cloud infrastructure, aka infrastructure as code. While this provided a lot of agility, but it also significantly increased the complexity that engineers need to deal with every day. Did the cloud make us more secure? Did it really improve the reliability of the typical modern application? I believe that although engineers can move faster, they are not necessarily feeling more fulfilled or freed from the brutal technology catchup cycle they have to go through to stay relevant and keep their jobs. I’m not talking about large organizations that can hire an army of engineers to handle all those complexities. I’m talking about startup and medium-sized companies that drive a lot of innovation through software.
Engineers are investing heavily in raw and dumb infrastructure as code. We are creating the future legacy code that is more complex and will significantly slow our ability to innovate.
Just Imagine if we solve these problems
I’d like to challenge you to think out of the box. How can we embed some intelligence in the next generation of cloud automation tools to elevate our game? Think specifically about these questions
Instead of instrumenting your infrastructure about every single change, how about we set high-level goals? For example, instead of adding scalability roles, shouldn’t we set goals around performance, efficiency, and cost savings and let your services and infrastructure take care of the rest to achieve these goals?
Shouldn’t our tools learn from engineers how they mitigate different issues and apply them whenever similar, not identical, situations take place?
Shouldn’t we harvest data coming out of applications to guide our engineers back to build a better code? For example, shouldn’t we get customized insights and possible solutions if certain parts of our code are likely to be vulnerable at a certain infrastructure setup?
Imagine if these problems were solved. How much can you do with a single line of code or maybe with no code at all!