DevOps: Unifying Dev, Ops, and QA

The term DevOps has been around for many years. Small and big companies adopt DevOps concepts for different purposes, e.g. to increase the quality of software. In this blog post, we define DevOps, present its pros and cons, highlight a few concepts and see how these can impact the entire organization.

What is DevOps?

At a high level, DevOps is understood as a technical, organizational and cultural shift in a company to run software more efficiently, reliably, and securely. From this first definition, we can see that DevOps is much more than "use tool X" or "move to the cloud". DevOps starts with the understanding that development (Dev), operations (Ops) and quality assurance (QA) are not treated as siloed disciplines anymore. Instead, they all come together in shared processes and responsibilities across collaborating teams. DevOps achieves this through various techniques. In the section "Implementation", we present a few of these concepts.

Benefits

Benefits of applying DevOps include:

Cost savings through higher efficiency.
Faster software iteration cycles, where updates take less time from development to running in production.
More security, reliability, and fault tolerance when running software.
Stronger bonds between different stakeholders in the organization including non-technical staff.
Enable more data-driven decisions.

Let's have a look how these benefits can be achieved by applying DevOps ideas:

How to implement DevOps

Automation and Continuous Integration (CI) / Continuous Delivery (CD)

Automation refers to a key aspect of the engineering-driven part of DevOps. With automation, we aim to reduce the need for human action, and thus the possibility of human error, as far as possible by sending your software through an automated and well-understood pipeline of actions. These automated actions can build your software, run unit tests, integrate it with existing systems, run system tests, deploy it, and provide feedback on each step. What we are describing here is usually referred to as Continuous Integration (CI) and Continuous Delivery (CD). Adopting CI/CD invests in a low-risk and low-cost way of crossing the chasm between "software that is working on an engineer's laptop" and "software that running securely and reliably on production servers".

CI/CD is usually tied to a platform on top of which the automated actions are run, e.g., Gitlab. The platform accepts software that should be passed through the pipeline, executes the automated actions on servers which are usually abstracted away, and provides feedback to the engineering team. These actions can be highly customized and tied together in different ways. For example, one action only compiles the source code and provides the build artifacts to subsequent actions. Another action can be responsible for running a test-suite, another one can deploy software. Such actions can be defined for different types of software: A website can be automatically deployed to a server, or a Desktop application can be made available to your customers without human interaction.

Besides the fact that CI/CD can be used for all kinds of software, there are other advantages to consider:

The CI/CD pipeline is well-understood and maintained by the teams: the actions that are run in a pipeline can be flexibly updated, extended, etc. Infrastructure as Code can be a powerful concept here.
Run in standardized environments: Version conflicts between tools and configuration or dependency mismatches only have to be fixed once when the pipeline is built. Once a pipeline is working, it will continue to work as the underlying servers and their software versions don't change. No more conflicts between operating systems, tools, versions of tools across different engineers. Pipelines are highly reproducible. Containerization can be a game-changer here.
Feedback: Actions sometimes fail, e.g. because a unit test does not pass. The CI/CD platform usually allows different reporting mechanisms: E-mail someone, update the project status on your repository overview page, block subsequent actions or cancel other pipelines.

The next sections cover more DevOps concepts that benefit from automation.

Multiple Environments

The CI/CD can be extended by deploying software to different environments. These deployments can happen in individual actions defined in your pipeline. Besides the production environment, which runs user-facing software, staging and testing environments can be defined where software is deployed to. For example, a testing environment can be used by the engineering team for peer-reviewing and validating software changes. Once the team agreed on new software, it can be deployed to a staging environment. A usual purpose of the staging environment is to mimic the production environment as closely as possible. Further tests can be run in a staging environment to make sure the software is ready to be used by real users. Finally, the software reaches production-readiness and is deployed to a production environment. Such a production deployment can be designed using a gradual rollout, i.e. canary deployments.

Different environments not only realize different semantics and confidence levels of running software, e.g. as described in the previous paragraph, but also serve as an agreed-upon view on software in the entire organization. Multi-environment deployments make your software and quality thereof easier to understand. This is because of the gained insights when running software, in particular on infrastructure that is close to a production setting. Generally, running software gives much more insights into the performance, reliability, security, production-readiness and overall quality. Different teams, e.g. security experts or a dedicated QA-team (if your organization follows this practice) can be consulted at different software quality stages, i.e. different environments in which software runs. Additionally, non-technical staff can use environments, e.g. specialized ones for demo purposes.

Ultimately, integrating multiple environments structures QA and smoothens the interactions between different teams.

Fail early

No matter how well things are working in an organization that builds software, bugs happen and bugs are expensive. The cost of bugs can be projected to the manpower invested into fixing the bug, the loss of reputation due to angry customers, and generally negative business impact. Since we can't fully avoid bugs, there exist concepts to reduce both the frequency and impact of bugs. "Fail early" is one of these concepts.

The basic idea is to catch bugs and other flaws in your software as early in the development process as possible. When software is developed, unit tests, compiler errors and peer reviews count towards the early and cheap mechanisms to detect and fix flaws. Ideally, a unit test tells the developer that the software is not correct, or, a second pair of eyes reveals a potential performance issue during a code review. In both cases, not much time and effort is lost and the flaw can be easily fixed. However, other bugs might make it through these initial checks and land in testing or staging environments. Other types of tests and QA should be in place to check the software quality. Worst case, the bug outlives all checks and is in production. There, bugs have much higher impact and require more effort by many stakeholders, e.g. the bug fix by the engineering team and the apology to the customers.

To save costs, cheap checks such as running a test suite in an automated pipeline should be executed early. This will save costs as flaws discovered later in the process result in higher costs. Thus, failing early increases cost efficiency.

Rollbacks

DevOps can also help to react quickly to changes. One example of a sudden change is a bug, as described in the last section, which is discovered in the production environment. Rollbacks, for example as manually triggered pipelines, can recover the well-functioning of a production service in a timely manner. This can be useful when the bug is a hard one and needs hours to be identified and fixed. These hours of degraded customer experience or even downtime makes paying customers unhappy. A faster mechanism is desired, which minimizes the gap between a faulty system and a recovered system. A rollback can be a fast and effective way to recover system state without exposing customers to company failure much.

Policies

DevOps concepts impose a challenge to security and permission management as these span the entire organization. Policies can help to formulate authorizations and rules during operations. For example, implementing the following security requirements may be required:

A deployment or rollback in production should not be triggered by anyone but a well-defined set of people in authority.
Some actions in a CI/CD pipeline should always be run while other actions are intended to be triggered manually or only run under certain conditions.
The developers might require slightly different permissions than a dedicated QA team to perform their day-to-day work.
Humans and machine users can have different capabilities but should always have the least privileges assigned to them.

The authentication and authorization tools provided by CI/CD providers or cloud vendors can help to design such policies according to your organizational needs.

Observability

As software is running and users are interacting with your applications, insights such as error rates, performance statistics, resource usages, etc. can help to identify bottlenecks, mitigate future issues, and drive business decisions through data. There exist two major ways to establish different forms of observability:

Logging: Events in text form that software outputs to inform about the application's status and health. Different types of logging messages, e.g. indicating the severity of an error event, can help to aggregate and display log messages in a central place, where it can be used by engineering teams for debugging purposes.
Metrics: Information about the running software that is not generated by the application itself. For example, the CPU or memory usage of the underlying machine that runs the software, network statistics, HTTP error rates, etc. As with logging, metrics can help to spot bottlenecks and mitigate them before they have a business impact. Visualizing aggregated metrics data facilitates communication across technical and non-technical teams and leverages data-driven decisions. Metrics dashboards can strengthen the shared ownership of software across teams.

Logging and metrics can help to define goals and to align a development team with a QA team for example.

Disadvantages

So far, we only looked at the benefits and characteristics of DevOps. Let's have a brief look at the other side of the coin by commenting on the possible negative side effects and disadvantages of adopting DevOps concepts.

The investment into DevOps can be huge as it is a company-wide, multi-discipline, and multi-team transformation that not only requires technical implementation effort but also training for people, re-structuring and aligning teams.
This goes along with the first point but it's worth emphasizing it: The cultural impact on your organization can be challenging due to human factors. While a new automation mechanism can be estimated and implemented reasonably well, tracking the progress of changing people's way of communication, feeling of ownership, aligning to new processes can be hard and might lead to no gained efficiencies, which DevOps promises, short-term. Due to the high impact of DevOps, it is a long-term investment.
The technical backbone of DevOps, e.g. CI/CD pipelines, cloud vendors, integration of authorization and authentication, likely results in increased expenses through new contracts and licenses with new players. However, through the dominance of open source in modern DevOps tooling, e.g. through using Kubernetes, vendor lock-in can be avoided.

Conclusion

In this blog post, we explored the definition of DevOps and presented several DevOps concepts and use-cases. Furthermore, we evaluated benefits and disadvantages. Adopting DevOps is an investment into a low-friction and automated way of developing, testing, and running software. Technical improvements, e.g. automation, as well as increased collaboration between teams of different disciplines ultimately improve the efficiency in your organization long-term.

However, DevOps represents not only technical effort but also impacts the entire company, e.g. how teams communicate with each other, how issues are resolved, and what teams feel responsible for. Finding the right balance and choosing the best concepts and tools for your teams represents a challenge. We can help you with identifying and solving the DevOps transformation in your organization.

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.