The development team finished the product 3 weeks ago, but it’s still not in production. You released a new feature and half the website went down. A significant amount of your day is spent dealing with production outages or issues. No matter what your role is, if you work in a technology organization, chances are you’ve heard the term DevOps. If any of the above problems sound familiar, chances are your company could benefit from adopting DevOps practices.
Before we dive deeper into other common problems that DevOps tries to solve, let’s start with a brief history of what DevOps is and where it originated. If you’re looking for a more in depth understanding of DevOps philosophies, keep an eye out for Part 2 of this series, where we’ll explore the key pillars of DevOps. This article will focus on the common problems that DevOps aims to solve.
DevOps is a philosophy that arose circa 2009 following the success of other frameworks for optimizing software delivery such as agile, scrum, and xtreme. All of these frameworks have a similar goal - develop higher quality software faster. While the aforementioned frameworks concentrate mostly on the development process, DevOps takes a more holistic approach by applying its philosophies to the entire value stream. This includes everything from the development process, to QA, to production releases and beyond. Let’s explore some of the leading indicators your organization could benefit from DevOps.
This problem can take many forms, but can be summed up by saying it takes too long for new value-providing features to make their way to production. To paraphrase, there is too much lead time in your processes and you’d like to reduce your time to market. Common scenarios that impact lead time include:
One strong indicator that you have an absence of production metrics is that users tell you about software problems. If help desk tickets are the norm for detecting production issues, your software is most likely frustrating users on a daily basis. In today’s world, users expect things to just work when it comes to software and the internet, and many users will not take the time to send you a help desk ticket when software doesn’t work. There are many cases of organizations missing out on tens of thousands of dollars in revenue due to a bug they didn’t know about. Mature organizations have found ways to monitor users in production and detect anomalous behavior before the user reports it. This allows the company to proactively fix problems before many users are even aware there is an issue. Even if you aren’t able to fix the problem immediately, you can signal to users that a fix is in progress.
Tracking errors in production is important but is only 1 of the 4 golden signals. Other important metrics include throughput, latency, and saturation of your applications. Armed with information from these metrics you can make critical decisions about the health of your production system thus improving the experience for the end user. For example, latency issues means everything the user does takes too long which leads to an unsatisfactory experience. Measuring throughout can help you understand how many users are using your application at any given time and whether your infrastructure can support it. Finally, saturation is a measure of how much of your infrastructure is being utilized; this is important for optimizing the costs within your organization.
It’s that time of the week / month / quarter / year again when you deploy the new features your development team has been diligently working on for the past several sprints. You’ve calculated when there is the least amount of traffic on your website to minimize risk. All the relevant parties are in the room, the operations engineers, QA, developers, and product managers. There is excitement in the air, but also quite a bit of apprehension. These production deployments don’t usually go well and often spiral into hours of overtime and late nights. The operations team starts the deployment process and within a few minutes you begin to see the warning signs that things aren’t going well. Help desk tickets are starting to trickle in as things break. Unfortunately there is no way to reverse this process once it has started so the only path is forward. You settle in for the long haul where tempers will run short and emotions high as everyone points the finger at each other. This is a common scenario in organizations that have not adopted DevOps principles; many of these organizations don’t recognize there is a better way.
Companies often have peak traffic times for their applications. For a ticket sales website, this is when a new event goes on sale. For retail companies, this takes place over the holidays and may peak on Black Friday. Many organizations spend a lot of time and money preparing for these peak times. In a DevOps environment, scaling applications to handle a larger workload can be as easy as pressing a button. In sophisticated organizations that run on public or private clouds, scaling can often happen autonomously based on traffic. Automating these types of tasks frees up valuable resources to work on business value projects instead of maintaining infrastructure.
If your organization is suffering from any of the above, it is likely you could benefit from the implementation of DevOps practices. DevOps can significantly improve your engineering team’s ability to adapt to market changes and release the features that give you an edge against your competition. Adopting a DevOps culture improves company morale and stops the blame game when it comes to effectively releasing software.