The Information Technology landscape today looks quite a bit
different than it did ten years ago. Over the last decade, we have witnessed almost
universal adoption of virtualization. Virtualization
in turn has led to cloud-based service deployments, and it has also driven a
move away from fixed-function appliances in favor of software features deployed
on general-purpose hardware (e.g. Network Functions Virtualization). More
recently, container technologies have enabled new micro-services-based architectures.
The latest trend seems to be a move towards Functions as a Service, which aims
to further decouple operational considerations from application functionality.
These changes have come about largely as the result of two main business drivers:
- Cost Reduction: virtualization allows for server consolidation, which offers the promise of significant reductions in capital expenditures (Capex). Cloud deployments further reduce Capex by replacing fixed infrastructure costs with monthly service charges.
- Increased Agility: cloud-based deployments allow infrastructure to be spun up instantaneously on-demand. This enables “as-a-service” access to infrastructure and applications, and it also supports dynamic scaling of service components in response to changing load requirements.
To make this agility
possible, automation tools had to be developed that allow software control over
infrastructure components. These tools are generally categorized under the Infrastructure-as-Code umbrella because they apply software development
approaches to infrastructure deployments.
One should ask then, how well all of this has played out so
far. Have these new technologies lived up to their expectations?
Ironically, it appears that in many cases, adoption of these new technologies has had the opposite effect than what was originally intended:
- Increased Cost: Whereas the desired Capex reductions may have been achieved, an increase in operational expenses (Opex) associated with managing virtualized services has in many cases more than offset any reductions in Capex.
- Reduced Agility: while automation may have made initial service deployments easier and less error prone, it has often proven challenging to make changes to running services without breaking things. Service providers have come to adopt the motto that “if it works, don’t touch it”, which all but eliminates any illusion of agility.
So, if things did not work out as expected, one could ask
where they went wrong. It turns out that in hindsight, these outcomes are actually
not that surprising. Virtualization, cloud, and container technologies all introduce
new management requirements, which increases operational expenses. In addition,
modern service architectures typically involve a larger number of smaller components,
which complicates lifecycle management of these services. The number of possible
component combinations grows exponentially, and the number of failure scenarios increases
dramatically as well, since many of the service components can fail
independently. As a result, managing modern services is significantly more
complicated than managing “traditional” services.
To compound the problem, administrators who are tasked with
managing these services often do not have the necessary visibility into all the
components that make up a service. It might be hard to find out where components
are deployed, and even harder how components interact. Administrators often
have no idea that a change to one service component may have unintended
consequences somewhere else. Not only does this make it hard to troubleshoot
problems, it also makes it near impossible to perform the moves, adds, changes,
and deletions (MACDs) that are required to manage running services.
Current automation tools such as Ansible and Terraform or
other “infrastructure as code” tools do not help the situation much. While they
do a good job of defining the necessary steps to get a service deployed, they do
not provide a clear record of the end result. How does one develop automation
to modify a running service without an accurate view of what is deployed? Even
worse, how do you create automated responses to failures without any context
within which to process failure events?
If automation is supposed to be the solution that reduces Opex
and enables agile MACDs, a new automation approach may be necessary. Such an approach
must start by addressing the main shortcoming of “infrastructure as code”
tools, which is lack of visibility into deployed service components and their associated states.
Without improved visibility, it is impossible to tackle the complexities of
current service architectures.
We propose model-driven automation as an alternative automation
paradigm that can address these shortcomings. At the core of a model-driven automation
tool is a full representation (in the automation system database) of all
deployed service instances and their components. We will refer to such a
representation as a service instance model. Service instance models provide
full visibility into all deployed services, and all service automation logic is
built around these service instance models. Not only does a service instance
model contain representations for all service components, it also keeps track
of the resources on which these components are deployed, and most importantly
it captures relationships that represent dependencies between service components.
These relationships are crucial for providing visibility into how changes to
one service component can affect other components in the same service (or even other
services that share the same resources).
By tracking service instance models, model-driven automation tools provide a path towards solving the Opex and agility challenges associated with current service architectures:
- Instance models provide the necessary visibility into runtime state that can serve as a starting point for automating moves, adds, changes, and deletions.
- Instance models also provide the necessary context within which to handle external service events. By using state represented in the instance model, the event handling logic itself can be stateless, which allows for modular and scalable failure handling automation.
Without instance models, I am afraid that automation tools may have reached their limit in how well they can keep pace with evolving service architectures. I believe it is time to adopt model-driven automation as a different paradigm that can take service automation to the next level.