Tuesday, July 14, 2020

Is it Time for a Different Automation Paradigm?

The Information Technology landscape today looks quite a bit different than it did ten years ago. Over the last decade, we have witnessed almost universal adoption of virtualization. Virtualization in turn has led to cloud-based service deployments, and it has also driven a move away from fixed-function appliances in favor of software features deployed on general-purpose hardware (e.g. Network Functions Virtualization). More recently, container technologies have enabled new micro-services-based architectures. The latest trend seems to be a move towards Functions as a Service, which aims to further decouple operational considerations from application functionality.

These changes have come about largely as the result of two main business drivers:

To make this agility possible, automation tools had to be developed that allow software control over infrastructure components. These tools are generally categorized under the Infrastructure-as-Code umbrella because they apply software development approaches to infrastructure deployments.

One should ask then, how well all of this has played out so far. Have these new technologies lived up to their expectations?

Ironically, it appears that in many cases, adoption of these new technologies has had the opposite effect than what was originally intended:

  • Increased Cost: Whereas the desired Capex reductions may have been achieved, an increase in operational expenses (Opex) associated with managing virtualized services has in many cases more than offset any reductions in Capex.
  • Reduced Agility: while automation may have made initial service deployments easier and less error prone, it has often proven challenging to make changes to running services without breaking things.  Service providers have come to adopt the motto that “if it works, don’t touch it”, which all but eliminates any illusion of agility.

So, if things did not work out as expected, one could ask where they went wrong. It turns out that in hindsight, these outcomes are actually not that surprising. Virtualization, cloud, and container technologies all introduce new management requirements, which increases operational expenses. In addition, modern service architectures typically involve a larger number of smaller components, which complicates lifecycle management of these services. The number of possible component combinations grows exponentially, and the number of failure scenarios increases dramatically as well, since many of the service components can fail independently. As a result, managing modern services is significantly more complicated than managing “traditional” services.

To compound the problem, administrators who are tasked with managing these services often do not have the necessary visibility into all the components that make up a service. It might be hard to find out where components are deployed, and even harder how components interact. Administrators often have no idea that a change to one service component may have unintended consequences somewhere else. Not only does this make it hard to troubleshoot problems, it also makes it near impossible to perform the moves, adds, changes, and deletions (MACDs) that are required to manage running services.

Current automation tools such as Ansible and Terraform or other “infrastructure as code” tools do not help the situation much. While they do a good job of defining the necessary steps to get a service deployed, they do not provide a clear record of the end result. How does one develop automation to modify a running service without an accurate view of what is deployed? Even worse, how do you create automated responses to failures without any context within which to process failure events?

If automation is supposed to be the solution that reduces Opex and enables agile MACDs, a new automation approach may be necessary. Such an approach must start by addressing the main shortcoming of “infrastructure as code” tools, which is lack of visibility into deployed service components and their associated states. Without improved visibility, it is impossible to tackle the complexities of current service architectures.

We propose model-driven automation as an alternative automation paradigm that can address these shortcomings. At the core of a model-driven automation tool is a full representation (in the automation system database) of all deployed service instances and their components. We will refer to such a representation as a service instance model. Service instance models provide full visibility into all deployed services, and all service automation logic is built around these service instance models. Not only does a service instance model contain representations for all service components, it also keeps track of the resources on which these components are deployed, and most importantly it captures relationships that represent dependencies between service components. These relationships are crucial for providing visibility into how changes to one service component can affect other components in the same service (or even other services that share the same resources).

By tracking service instance models, model-driven automation tools provide a path towards solving the Opex and agility challenges associated with current service architectures:

  • Instance models provide the necessary visibility into runtime state that can serve as a starting point for automating moves, adds, changes, and deletions.
  • Instance models also provide the necessary context within which to handle external service events. By using state represented in the instance model, the event handling logic itself can be stateless, which allows for modular and scalable failure handling automation.

Without instance models, I am afraid that automation tools may have reached their limit in how well they can keep pace with evolving service architectures. I believe it is time to adopt model-driven automation as a different paradigm that can take service automation to the next level.

No comments:

Post a Comment