People used to think software development was “done” when the
code was written and passed all its tests. But modern IT systems
aren’t done until they are online, running, and integrated with
their data feeds, storage, networks, and administration systems.
This can involve very elaborate operations steps, such as
dynamically creating a whole array of virtual servers, storage
devices, accounts, software configurations, and network
configurations.
These days, almost every
company wants DevOps to automate such powerful infrastructure,
speed up release cycles, and improve reliability and uptime. As an
ultra-high-value and rapidly changing industry, Financial
Technology lives near the cutting edge of innovation. Its DevOps
can include continuous integration and continuous deployment
(CI/CD), automated testing, containerization (as with Docker and
Kubernetes), system monitoring, use of cloud features (like AWS,
Azure, or a private cloud), virtual private clouds, extensive
firewalls, advanced network security, and more.
FinTech IT projects have
a lot at stake, and wise engineers will hand-pick DevOps priorities
to match the project’s objectives and exposures. Let’s look first
at what FinTech overall should expect from DevOps, and then at how
different subfields should emphasize additional, specialized DevOps
requirements.
What every FinTech solution needs from DevOps
Compared to other
industries, FinTech places an unusually high priority
on:
- Maintainability: improved analyses, features, and data
integrations must roll out very frequently, with very low
latency
- Quality: an uncaught mistake or security hole quickly runs to
millions of US Dollars in cost, sometimes much more
- Data integration: FinTech applications are fundamentally about
digesting a never-ending stream of new information, and the more
feeds (or the more atomic inputs) that can be handled, the
better.
DevOps is
maintainability’s best friend. As far back as 2013, here at FP
Complete, we were releasing large upgrades to major server
applications every few weeks. These days, at large Internet
companies it is routine to see daily release cycles, and faster is
quite doable.
FP Complete recommends a
fully automated Continuous
Integration and Continuous Deployment (CI/CD) system, including automated builds, an
automated test suite, immutable containerized servers, and
post-deployment health checks with rollback capability (Blue-Green
deployment. If you haven’t implemented containers yet, you almost certainly
should.
FP Complete also
recommends a formal Quality Assurance
system.
Data integration can be
quite application-specific, but FP Complete recommends choosing a
very small number of supported data formats, and having a clean
layer providing these formats after data ingestion, normalization,
and cleansing from a more diverse set of inputs. A service-oriented
architecture (SOA) can make it easy to add new data-feed parsers in
a completely language-independent manner, ensuring system
extensibility. (See my comments on “Modular Design” here. Automated deployment and
monitoring let you have more services running as separate
processes, by eliminating the need to manually examine each
server’s status constantly.
So DevOps can be a great
help to FinTech in general. But there is still far more to be had
-- and our next priorities depend on which kind of FinTech solution
we are building.
DevOps for Cryptocurrency
Cryptocurrency systems
are of course sensitive to attacks in which a person attempts to
steal the coins. Numerous real-world losses have been traced to a
failure to implement proper DevOps, leaving opportunities for
criminals.
If you’re implementing
or trading a cryptocurrency, here are some DevOps issues to focus
on right away:
- Automated testing. If your build system allows
you to release code that has not been through your test suite -- or
worse, allows you to be unsure whether the released code
was tested -- you are taking undue risk. Quality assurance
automation should be a core part of your build system. This is even
truer if you are using CI/CD, where code improvements may be
released quite frequently. “I write quality code in the first
place” is great, but it’s not a substitute for automated
testing.
- Component isolation. To minimize the chances
of malfeasance, sensitive systems should be modular in design,
and unrelated components should run in separate processes --
ideally in completely separate VMs separated by firewalls. A
defect, code injection, privilege escalation, or social-engineering
attack on one service or component should still be unable to tamper
with another.
- Storage redundancy. It’s amazing to think that
some people implement trading and coin-storage systems without
redundant storage. With many cryptocurrencies, your coins can be
permanently lost if a unique code number, only a few
kilobytes in length, is lost. Use automated deployments on the
virtual cloud to ensure that all your trading and management
systems are always deployed to redundant cloud storage with
inherent fault tolerance and permanent backups.
- Separation of roles. The
amount of value accessed by some cryptocurrency components is so
high that you must consider the impact of a compromised person. If
your deployment architecture has a single “admin” role that gives
one person the ability to deploy code, access storage, turn off
monitors, and change audit logs, you asking for that person to get
into trouble. Don’t tempt anyone to pressure your staff: make it
impossible for any one person to change where large sums of money
go, or at least make it impossible for them to do so
without setting off alarms. Create different admin roles for
different system components and layers -- roles that are not
available to the same person at the same time.
DevOps for automated trading
If you are trusting your computer system to move and trade
assets autonomously, you need absolute correctness, inviolable
security, and very rapid response to trouble incidents. In addition
to the concerns I listed for cryptocurrency , pay attention to
these DevOps priorities:
- Automated load testing and regression
testing. Since you are likely to update your trading algorithms
many times a year, there are many opportunities to introduce
performance problems. If a slow trade can be worse than no trade,
no build should be allowed to go into production without automated
performance testing under heavy simulated load. It’s not
enough to say “my code runs fast,” you need to be able to say “my
whole deployed system runs fast, even when bombarded by fast
inputs.”
- Immutable servers It is incredibly
tempting to patch production systems with improvements. But without
careful controls, this leads to having production servers in a
state that is completely unprecedented in your test
environment. Instead, use automated deployment to create new copies
of your servers with the new code already in place -- and when
these pass your test suite, swap them in, make sure they’re up,
then shut down and delete the old unpatched virtual servers. This
kind of roll-forward can be completely automated with tools like
Kubernetes, and can take advantage of cautious switch-over
techniques like blue-green deployments or canary deployments.
DevOps for human-assisted trading
This is a relatively forgiving application if your users are
in-house or otherwise very tolerant of imperfection. (If you’re
providing trading services to external parties, you can expect to
be held to a very high standard.) Ask yourself, “what is the cost
of a typical failure to our business?” Systems with a human in the
loop are sometimes more error-tolerant than systems without a human
in the loop.
However, you will need to think much more about usability
testing, because a confusing UI update can introduce human error;
and your automated test suite should include tests that drive the
system through the UI, to detect coding bugs in that layer.
Human users of FinTech systems are often very powerful people,
and a small number of unhappy users can make for a very bad day. So
in addition to extensive testing, your DevOps practices may need to
include gradual deployment of new versions to a test population of
a few users (canary deployments), and support for both halts and
rollbacks in case of significant trouble reports.
DevOps for asset valuation, market analysis, and research
These tasks often amount to a medium amount of math, performed
on a large number of input feeds and databases. Many firms
construct unique asset valuation formulas by insightfully combining
data that no one else was combining, or doing complex combinations
with uniquely clever functions and formulas. Competitive advantage
comes from generating unique insights, and these come from the
ability to scale up innovative formulas and innovative data
integrations quickly.
At FP Complete we regularly hear from FinTech firms that have
built important analyses running on just one or a few desktops, who
need these scaled up to a reliable server-based system. Beyond all
the usual software engineering and cloud deployment problems, a key
concern is maintainability . Analysts are used to updating their
formulas several times per month and then sharing them with
colleagues. And it’s important that new versions don’t always
offline the old versions, which a colleague may still be using.
A version control system attached to a CI/CD system works
wonders for safe maintainability. But it needs to be coupled to a
simple metadata system that keeps track of which versions are now
running at which addresses -- and which allows versions with zero
remaining clients to be shut down.
DevOps for consumer banking and account management
These applications require an exceptional amount of integration
with legacy systems, some very old. They have to maintain an
extremely consistent user interface, for use by clients who can be
upset by the unnecessary change. They have all the requirements of
an e-commerce application, such as resistance to sudden surges in
demand. And they are subject to extremely large-scale Web-based
security attacks, as the payoff for a successful criminal break-in
could be enormous.
DevOps for voting
Voting, such as for shareholder votes or Board of Directors
elections, is a particularly sensitive subject, with huge decisions
being made and significant legal exposure. Ordinary voting is
rather similar to consumer banking and e-commerce (using votes
as the currency), but where governance rules require
anonymity, standards increase enormously. You must earn voter
confidence, and your systems should be able to pass a really
rigorous audit, including against insider malfeasance, while
protecting the privacy of each voter.
For such systems we recommend a DataOps solution in which raw
inputs (from user interaction) are fully quarantined from the apps
that handle persistent data storage, using very assertive firewalls
and very low-permission accounts. An auditor must be able to verify
that systems holding private user data are completely inaccessible
from unauthorized locations.
Since the anonymization steps at the application layer may be
intentionally irreversible, anonymized data should be stored with
very high redundancy. It may be impossible to auditably reconstruct
from scratch after identity data has been discarded.
Compliance, regulation, and auditability
For applications subject to extensive outside controls, it’s
important to demonstrate adherence to the spec (application
verification) and to be able to trace concerning behavior back from
the running system to the code and checkins that caused it
(traceability).
For verification, ensure that you have an automated test suite
with organized test case management and that it is automatically
fired up as a part of your CI/CD system. Be sure your full
test suite is run before real deployments, not just your quick
check test suite (sometimes called the smoke test)
which is automatically run every time a build is done.
For traceability, ensure that your CI/CD system inserts serial
numbers into your distributable software containers and other built
artifacts, and records what artifact version numbers were used
(including source code, libraries, and tools). And require that
checkins to your version control system include links back to the
requirements they were meant to satisfy.
To ensure that what went into production is still
what’s in production, don’t grant permission to apply manual
changes to running servers. Create admin accounts with limited
permissions that can’t be used as “back doors,” and use
immutable servers so that a new deployment is required
when someone wants to change what’s on a server.
Security and endurance against direct attack
If your application is on the public Web, the server cloud
design, the software maintenance schedule, and the network/firewall
design all need to be designed to withstand malicious treatment.
The average IT organization currently spends 12% to 15% of
its budget on security.
DevOps can do much more to defend you than many people realize,
and can make the most of your security budget.
Many security breaches happen through social engineering. Reduce
these opportunities by automating control of your servers under
distinct robot admin accounts, ones that normal users never use.
Keep dangerous permissions away from regular IT staff going about
their days.
Other critical breaches have famously happened through old,
unpatched software with known vulnerabilities. Routinely audit the
versions of operating systems and runtime components that are
installed on all your servers, to ensure you don’t have obsolete
ones in production. This can be largely automated.
Security breaches are worsened by having far too much access
available in a single place, allowing a small intrusion to escalate
into a big one. Take advantage of cloud network configurability ,
separate VMs and containers, and firewalls, to ensure that critical
attack targets (like databases and production servers) are hard to
reach and extra hard to enter as administrator, and to
ensure that critical attack vectors (like front-end servers) are
quarantined, firewalled, and monitored for unauthorized
activity.
How do we get there from here?
Unlike some older technologies, DevOps is not monolithic and can
be implemented in small steps over an arbitrary period of time.
Even the sequence of these steps is flexible. FP Complete
recommends an incremental approach.
If you already have traditional software engineering and
traditional system operations and monitoring in place, focus next
on (A) streamlining your software engineering environment, or (B)
containerizing and automating your deployment systems. Either makes
a great next step and will put you well on the road to complete
DevOps.
As always, remember that FP Complete is available to do a
readiness assessment project for your DevOps, cloud,
and other IT systems engineering. With experience on numerous
advanced IT projects, we’re happy to team up with you with
planning, design, implementation, knowledge transfer, audits, and
upgrades.
For More Information
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.
Do you like this blog post and need help with Next Generation Software Engineering, Platform Engineering or Blockchain & Smart Contracts? Contact us.