My DevOps Journey and How I Became a Recovering IT Operations Manager

DevOps Challenges

I managed an eight-person team that supported data integration tools for a Fortune 100 tech company. One of the tools we supported was adopted by every business unit to integrate a large number of applications, database, and information repositories together. Over a period of 7 - 8 years the number of production integration applications grew to over 800. During the years of scaling multiple environments we faced several challenges:

Managing the large number of servers
Maintaining performance
Ensuring high availability
Keeping up with user support

We hosted the integration platform on Oracle/Solaris servers that could handle the load of 20 - 30 integration applications each. The first performance challenge we faced was the integration platform repository. All integration application deployments are performed by using the integration platform’s administration tool that stored each application’s configuration into a database repository. As the number of applications grew, eventually the performance of the administrator repository starting impacting the time to deploy new applications and the time it took to start up any application. The solution was to break the single domain into business unit domains, each with a smaller sized administrator repository. But this introduced a new problem: a significant increase in the number of hosts needed to run the multiple instances of the administrator. When virtualization technology was introduced into Solaris via Solaris Zones, we were able to reduce the number of physical hosts by running each domain administrator instance in a different zone on a single physical host.

The next challenge we faced was upgrading the data integration platform. To perform an upgrade the entire production environment needed to be taken down in its entirety since the platform would only run if all nodes ran the same version of the platform. To complicate matters, even though the integration process engines were supposed to be forward compatible with newer versions of the integration platform, we were required to have all process engines tested by the owning business units before the upgrade. It was an impossibility to get all the BU’s to test their applications in a narrow timeframe so we would have a completely tested set of production apps when the upgrade would take place.

Finding the right tools

The method we chose to work around this was to build out a completely new production environment with the latest integration platform and migrate apps from the old environment as BU’s tested and cleared their apps for the newer platform version. This spread the upgrade cycle out over several months, was extremely wasteful in hardware resources, and added a huge management burden on my team. All of this kept our upgrade cycles rather long. Even though there were major upgrades twice a year and monthly patches, we were only able to do upgrades every three years!

Technology kept advancing, and cloud services started appearing. Our vendor fielded a private cloud solution that included features specific to the integration platform. I saw several aspects of its capabilities that I knew I could leverage to overcome difficulties we had in managing and scaling our integration environments.

The cloud product had an auto-restart capability for application failures that eliminated the need to run high availability pairs of integration processes which immediately reduced my CAPEX by 50%. That savings more than paid for the cloud product in the first year of operation.

Another feature of the cloud product was the ability to deploy the integration platform and integration processes into containers. The great aspect of this was that each logical machine could run a completely independent stack of the integration platform components deployed in an environment. Gone was the requirement that every node in an environment had to run the same version of the component stack. Now upgrades could be done on a container by container basis with no need to field additional hardware, significantly simplifying and reducing the cost of upgrades.

We also took advantage of script-driven automation tools to create automated deployment processes. All a developer had to do was to email his integration process artifact along with a descriptor file to an email pipe to a process that deployed the artifact to the target non-production environment and domain. Production deployments were a little different because instead of automatically deploying artifacts to production, the artifact was staged and a ticket was generated for a request for my team to do the deployment.

This provided a huge boost to productivity - development teams didn’t have to wait for my team to deploy their apps before they could begin testing, QA, or UAT cycles. My team also saved significant time not having to manually configure and deploy 40+ apps per week. We also noticed another benefit of automated deployments: An almost complete elimination of deployment failures. Previous to automated deployments, my team had to manually configure each deployment. By eliminating this step, so were errors made by my team when re-keying in application configuration parameters.

DevOps Pixel Image - 1025 x 576.jpg

DevOps solved the challenges

Not long after we fielded the cloud platform and automated processes I started hearing the DevOps buzzword. As I started to learn what DevOps entailed, I saw the potential to utilize the technologies and tools to make further improvements in managing all the middleware my team managed. The further I explored, the more I realized the full impact that incorporating DevOps could have on an IT organization. In addition to increasing productivity and saving costs by eliminating lots of manual processes, DevOps could also:

Automate deployment and configuration of infrastructure
Allow a means to make infrastructure available through self-service
Greater stability of the infrastructure through consistency of builds
Produce higher quality code through automated testing
Reduce service outages by eliminating the main sources of failures
Provide faster feedback to developers reducing the time and cost of debugging
Make deployments and upgrades seamless, eliminating the need to perform them on nights and weekends
Improve coordination and communication between dev and ops teams
Allow IT to rapidly meet new business objectives

It was gratifying to realize, even back before DevOps was called DevOps, the sort of huge impact that this technology was having on my colleagues’ productivity. I was sold on the huge benefits that come from adopting DevOps, and I’m not the least bit surprised at how quickly DevOps has become a major movement in the IT industry. A quick survey of DevOps tool providers will turn up a hundred companies, and the list is continuing to grow. If I were doing the same project today, I’d be using Linux and a cloud provider like AWS, and tools like Docker and Kubernetes.

It was gratifying to be one of the early adopters of containerization and automated deployments, and I can honestly say they worked like a charm on the very first project we used them on -- even though it was a very complex and mission-critical set of enterprise systems. Sometimes you just know you picked the right technology The only thing I can’t fathom is how I survived with my sanity intact after so many years as an IT operations manager without DevOps.

If you liked this article you might also like:

Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.

Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.