In Part 1 of this series we took a look at some common symptoms of organizations that could benefit from the adoption of DevOps principles. In this post we’ll dig a little deeper into DevOps and answer questions like: Where did DevOps originate? How does DevOps define itself? And what are some of DevOps core tenants? Let’s try to answer the first question, where did DevOps originate?
DevOps as a term originated in 2009 following a talk at the O’Reilly Velocity Conference titled “10+ Deploys per Day: Dev and Ops Cooperation at Flickr.” John Allspaw and Paul Hammond walked through some of the pains in the current software development lifecycle, identifying familiar contentious scenarios that had become all too common between development and operations teams. “It’s not our machines, it’s the developers code!”. “We can’t test our code because operations can’t get us a production environment!” John and Paul made the case that the only rational way forward is to integrate development and operations into a more cohesive unit. This talk is widely accepted as the birth of the term DevOps and the beginning of a movement in IT that is still very much with us today. For a more complete DevOps history checkout The Origin of DevOps: What’s in a Name?
Even though DevOps was coined as a term over 10 years ago, it has become a confusing buzz word in recent years and there are a lot of misconceptions about how DevOps is actually defined. There are job postings for DevOps Engineers, a seemingly infinite number of tools in the space, and even companies offering DevOps as a service. With all of the misinformation out there it can be hard to understand DevOps. In fact, it may be easier to start with some common misconceptions before moving onto exploring a more formal definition of DevOps.
DevOps is a young movement and many organizations are still wrestling with its definition and the value it provides. Consequently the mistakes above are not uncommon in this field. Let’s take a look at what DevOps actually is, and how the community defines itself. While this article provides a good overview of DevOps, I highly encourage anyone looking to continue their journey in this space to read The Phoenix Project and The DevOps Handbook. I consider these books must reads for any new hire at Callibrity.
The Three Ways are instrumental to any DevOps practitioner and deserve a deeper dive. Let’s break down the three ways and provide some examples of each one.
The First Way says we need to accelerate the flow of work through the organization. This is often done by visualizing how work is done using a process like value stream mapping. Value stream mapping involves creating a visual representation of how work flows through an organization from beginning to end. In software we often think of the start being ideation of a new feature, and the end being the new feature running in production and available to customers. When creating a value stream map you should involve all parties that take part or have a stake in the work being done. This helps to ensure everyone has the same base level understanding and no critical parts of the work are missed.
The First Way promotes the idea of Systems Thinking, meaning we should try to look at an entire system when considering solutions, not just an individual part. It is not valuable to spend an excessive amount of time perfecting the development process if things grind to a halt once we try to deploy to production. In this case the deployment process is a bottleneck within our value stream. Once a value stream map has been completed it becomes very easy to identify such bottlenecks in an organization. A bottleneck is not always time due to time spent doing work. Sometimes the bottleneck may be due to the amount of time pending work spends sitting in a queue after a handoff, or waiting for approval in an overburdened change management process.
Another way to improve the flow of work is to automate whenever possible and decrease Work In Progress (WIP) to manageable levels. Organizations can see great productivity gains by automating tedious repetitive tasks like building applications, deploying code, and provisioning infrastructure. Decreasing WIP allows teams to focus on doing a smaller number of tasks at a time. This increased focus leads to better quality work, decreasing the likelihood that a piece of work is sent backwards in the pipeline. Ideally, all work flows from left to right and is never sent backward. Anytime work is sent back in the pipeline, someone must context switch and stop the flow of any work that is currently in progress.
In order to accelerate the flow of work in an organization, we need to introduce and amplify feedback loops throughout the value stream. The faster a developer or any member of the team receives feedback on broken builds, failing tests, or features that don’t meet requirements; the faster the issue can be rectified and the work can continue to flow. We want to create feedback loops that inform from right to left in the value stream. You will often hear the term ‘shift left’, which means shifting the feedback as far left or as early in the value stream as possible. This is an extremely important concept because the earlier problems are discovered, the less expensive they are to fix.
To create some of these feedback loops organizations often rely on Continuous Integration pipelines that provide immediate feedback on problems with new code. This works by running tests and creating builds whenever developers check new code into the repository. Developers are then alerted of the status of their latest check in. As organizations become more mature they can move to Continuous Deployment, which is having a build ready to deploy at any time. Ultimately you can move to a Continuous Delivery model, in which code flows from a developers machine all the way to production without any manual intervention.
Feedback loops should not only exist in the workflow pipeline, but also in production environments. It is extremely important to have good metrics around how your products are behaving in production so that you can respond to your customers needs. From a software and operations perspective, we are usually concerned about the performance of our production applications. A good set of metrics to use for monitoring production workloads are the 4 Golden Signals. These consist of Error Rate, Latency, Traffic, and Saturation. Production monitoring allows you to obtain real time feedback on problems and respond to issues before your customer even knows they exist.
The natural state of all processes is entropy, and software development is no different. The Third Way calls for creating a culture of continuous experimentation, learning, and improvement within an organization. An organization should look to carve out time for employees to experiment outside of their daily duties, Google is famous for allowing employees 20% of their time for non business project work.
Another key tenant of The Third Way is shared responsibility. A healthy culture avoids pointing fingers or playing the blame game when things go wrong. In fact, a DevOps organization will look at failures as opportunities for improvement. Organizations like Netflix accelerate the pace of production failures using tools like chaos monkey, which purposely takes down infrastructure in an effort to force improvement in software resiliency. This allows them to survive entire AWS region outages.
The Third Way also subscribes to the idea that repetition is mastery. An organization must practice certain scenarios in order to get good at them. An example of this is production incidents. All organizations suffer production incidents, but how the organization responds can vary greatly. The Third Way advocates for a concept called Game Days in which production like outages are created and the team can practice responding in an orderly fashion. The goal of this effort is to reduce Mean Time To Resolution for outages, which is another key metric of highly functioning DevOps organizations.
Today we explored the key principles of DevOps, some misconceptions, and its origin. Keep an eye out for Part 3 of this series where we will take a look at some DevOps best practices, discuss how you can begin a DevOps transformation in your organization, and take a look at a case study.