DevOps Challenges
I managed an eight-person team that supported data integration
tools for a Fortune 100 tech company. One of the tools we supported
was adopted by every business unit to integrate a large number of
applications, database, and information repositories together. Over
a period of 7 - 8 years the number of production integration
applications grew to over 800. During the years of scaling multiple
environments we faced several challenges:
- Managing the large number of servers
- Maintaining performance
- Ensuring high availability
- Keeping up with user support
We hosted the integration platform on Oracle/Solaris servers
that could handle the load of 20 - 30 integration applications
each. The first performance challenge we faced was the integration
platform repository. All integration application deployments are
performed by using the integration platform’s administration tool
that stored each application’s configuration into a database
repository. As the number of applications grew, eventually the
performance of the administrator repository starting impacting the
time to deploy new applications and the time it took to start up
any application. The solution was to break the single domain into
business unit domains, each with a smaller sized administrator
repository. But this introduced a new problem: a significant
increase in the number of hosts needed to run the multiple
instances of the administrator. When virtualization technology was
introduced into Solaris via Solaris Zones, we were able to reduce
the number of physical hosts by running each domain administrator
instance in a different zone on a single physical host.
The next challenge we faced was upgrading the data integration
platform. To perform an upgrade the entire production environment
needed to be taken down in its entirety since the platform would
only run if all nodes ran the same version of the platform. To
complicate matters, even though the integration process engines
were supposed to be forward compatible with newer versions of the
integration platform, we were required to have all process engines
tested by the owning business units before the upgrade. It was an
impossibility to get all the BU’s to test their applications in a
narrow timeframe so we would have a completely tested set of
production apps when the upgrade would take place.
Finding the right tools
The method we chose to work around this was to build out a
completely new production environment with the latest integration
platform and migrate apps from the old environment as BU’s tested
and cleared their apps for the newer platform version. This spread
the upgrade cycle out over several months, was extremely wasteful
in hardware resources, and added a huge management burden on my
team. All of this kept our upgrade cycles rather long. Even though
there were major upgrades twice a year and monthly patches, we were
only able to do upgrades every three years!
Technology kept advancing, and cloud services started appearing.
Our vendor fielded a private cloud solution that included features
specific to the integration platform. I saw several aspects of its
capabilities that I knew I could leverage to overcome difficulties
we had in managing and scaling our integration environments.
The cloud product had an auto-restart capability for application
failures that eliminated the need to run high availability pairs of
integration processes which immediately reduced my CAPEX by 50%.
That savings more than paid for the cloud product in the first year
of operation.
Another feature of the cloud product was the ability to deploy the
integration platform and integration processes into containers. The
great aspect of this was that each logical machine could run a
completely independent stack of the integration platform components
deployed in an environment. Gone was the requirement that every
node in an environment had to run the same version of the component
stack. Now upgrades could be done on a container by container basis
with no need to field additional hardware, significantly
simplifying and reducing the cost of upgrades.
We also took advantage of script-driven automation tools to create
automated deployment processes. All a developer had to do was to
email his integration process artifact along with a descriptor file
to an email pipe to a process that deployed the artifact to the
target non-production environment and domain. Production
deployments were a little different because instead of
automatically deploying artifacts to production, the artifact was
staged and a ticket was generated for a request for my team to do
the deployment.
This provided a huge boost to productivity - development teams
didn’t have to wait for my team to deploy their apps before they
could begin testing, QA, or UAT cycles. My team also saved
significant time not having to manually configure and deploy 40+
apps per week. We also noticed another benefit of automated
deployments: An almost complete elimination of deployment failures.
Previous to automated deployments, my team had to manually
configure each deployment. By eliminating this step, so were errors
made by my team when re-keying in application configuration
parameters.
DevOps solved the challenges
Not long after we fielded the cloud platform and automated
processes I started hearing the DevOps buzzword. As I started to
learn what DevOps entailed, I saw the
potential to utilize the technologies and tools to make further
improvements in managing all the middleware my team managed. The
further I explored, the more I realized the full impact that
incorporating DevOps could have on an IT organization. In addition
to increasing productivity and saving costs by eliminating lots of
manual processes, DevOps could also:
- Automate deployment and configuration of infrastructure
- Allow a means to make infrastructure available through
self-service
- Greater stability of the infrastructure through consistency of
builds
- Produce higher quality code through automated testing
- Reduce service outages by eliminating the main sources of
failures
- Provide faster feedback to developers reducing the time and
cost of debugging
- Make deployments and upgrades seamless, eliminating the need to
perform them on nights and weekends
- Improve coordination and communication between dev and ops
teams
- Allow IT to rapidly meet new business objectives
It was gratifying to realize, even back before DevOps was called
DevOps, the sort of huge impact that this technology was having on
my colleagues’ productivity. I was sold on the huge benefits that
come from adopting DevOps, and I’m not the least bit surprised at
how quickly DevOps has become a major movement in the IT industry.
A quick survey of DevOps tool providers will turn up a hundred
companies, and the list is continuing to grow. If I were doing the
same project today, I’d be using Linux and a cloud provider like
AWS, and tools like Docker and Kubernetes.
It was gratifying to be one of the early adopters of
containerization and automated deployments, and I can honestly say
they worked like a charm on the very first project we used them on
-- even though it was a very complex and mission-critical set of
enterprise systems. Sometimes you just know you picked the right
technology The only thing I can’t fathom is how I survived with my
sanity intact after so many years as an IT operations manager
without DevOps.
If you liked this article you might also like:
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.