Simplify the Cloud

Monday, August 2, 2021

Converting YANG to TOSCA

YANG is a data modeling language that is the de-facto standard in the networking industry for defining schemas for network configuration data and runtime state. Most vendors provide YANG modules for configuring their equipment, and many internet RFCs have been created that standardize configuration data schemas that can be adopted by multiple vendors.

However, YANG has also seen applications that go beyond configuration data modeling. In fact, some standards bodies have used YANG to model end-to-end services. Those standards use YANG to define the components of which a service is composed as well as the configuration data and runtime state for each of those components.

Unfortunately, given that YANG is strictly a data modeling language, YANG cannot be used to express service-related semantics. For example, YANG does not allow you to specify which parts of the model define service components and which parts define configuration data for those components. That information can only be found in the standards documents that define the services in the first place. Similarly, the YANG language alone can also not be used to define dependencies between service components, or to express how some components can be decomposed using entire other service models. The YANG language does not have support for expressing those semantics in a general-purpose fashion.

Of course, service semantics are a must-have for service lifecycle management. Automated service lifecycle management systems must know which components make up a service, how those components depend on one-another, how instantiation and activation of those components must be sequenced, how configuration information may need to be propagated between components, etc. To avoid having to build knowledge of service semantics for each individual service into the lifecycle management system, it must be possible to express service semantics using the service modeling language.

This is where TOSCA shines. TOSCA is a service modeling language for automating service lifecycle management:

Whereas YANG documents are trees, TOSCA uses graphs to model services. TOSCA service topology graphs explicitly model service components (using TOSCA nodes) and the dependencies between these components (using TOSCA relationships).
TOSCA includes data types for defining the schemas of configuration data for nodes and relationships. As such, TOSCA data type definitions are similar to YANG schema definitions.
For each component, TOSCA allows component designers to define the ports through which their components interact. TOSCA introduces capabilities and requirements for this purpose. Capabilities and requirements allow for modular design of reusable components.
TOSCA allows individual components to be modeled using their own service sub topologies. This is useful to model how network functions can be disaggregated into collections of finer grain components.

Using TOSCA, it is possible to build general-purpose service lifecycle management systems that rely strictly on the semantics built into the TOSCA language without having to introduce service-specific knowledge into the management system.

One possible approach for allowing YANG-based service models to be “consumed” by general-purpose service lifecycle management systems is to convert those YANG models to TOSCA. We have created a tool called yang2tosca that aims to do just that. The conversion involves two steps:

First, YANG models are converted automatically by yang2tosca to (almost) equivalent TOSCA date type definitions.
Next, service designers must (manually) convert some of the resulting TOSCA data types into node types, relationship types, requirements, and capabilities. This step requires knowledge of service-specific semantics and cannot be automated.

The tool can be found at https://github.com/lauwers/yang2tosca

Please try it out and let us know what you think.

Monday, July 27, 2020

Modeling is Hard

If model-driven automation can reduce operational expenses associated with service lifecycle management, then why have we not yet seen universal adoption of the model-driven automation paradigm? I suspect that a large part of the answer lies in the fact that creating the right models is not as easy as it may seem. While modeling activities have been taking place in the Information Technology industry for decades, each Standards Development Organization (SDOs) has created their own models based on the specific business requirements of their target industry. Harmonizing these models has proven challenging. In addition, as far as I know no SDO has approached modeling with a goal of automating service lifecycle management. As a result, many existing models suffer from limitations that make them ill-suited as the foundation for model-driven automation systems.

Here is a list of common modeling “mistakes”:

Including BSS (Business Support System) concepts into the models. For example, the TMF SID (Service Information Data model) defines two classes of services: customer-facing services that can be purchased by a customer as part of an offering, and resource facing services that are bound to resources. Clearly this distinction is useful for the BSS—it determines whether the service shows up in a product catalog—but it does not belong in the models for an automation system. For purposes of automating the lifecycle management of a service, it makes no difference whether a service is customer facing or resource facing, and the distinction only gets in the way.
Differentiating between service models and resource models. Many if not most modeling standards make an explicit distinction between service models and resource models. However, whether something is a service or a resource is not an intrinsic aspect of the entity being modeled. Instead, it is an aspect of how that entity is being used, i.e. the role that the entity plays at a given point in time in the context in which it is modeled. For example, an entity provided as a service by one service provider may be used as a resource for a service provided by a different provider. Depending on which provider is modeling the component, the same component can either be modeled as a service or as a resource.
Marking entities as either composite or atomic. For purposes of management, some entities can be treated as black boxes: the management system does not need to know about the internals of those entities to configure and manage them. Other entities may need to be treated as complex systems that are built from a set of components, and managing the system requires management of each of these components individually. It is tempting to model black box components as atomic and model systems as composite entities that are in turn composed of atomic entities. However, whether an entity is atomic or composite is not an intrinsic aspect of that entity. In reality nothing is ever atomic, and even atoms aren’t atomic. The distinction between atomic and composite is merely a reflection of the level of detail required by the management systems at a given point of time. That level of detail may change over time, at which point it may become necessary to turn atomic entities into composite entities. This may be impossible to do without recreating entire model hierarchies.
Encoding domain-specific concepts in the meta model. As shown earlier, automated lifecycle management tools rely on semantics of the meta model to provide automation functionality. If the meta model includes domain-specific concepts, then it will be difficult to use the models for application domains other than the one for which the model was designed. For example, the Kubernetes meta model includes Pods and Containers, which makes it hard to use Kubernetes for network automation.

Any proper modeling approach must avoid these pitfalls in the models. More importantly, it can absolutely not afford to use a meta model that includes these mistakes. Without a proper meta model and without the proper split between what is in the meta model and what is in the models themselves, it is impossible to create an automation platform that can process models in a general purpose, domain-independent fashion.

Monday, July 20, 2020

The Need for Domain-Independent Automation Platforms

Controlling the operational expenses associated with managing the life-cycles of software services requires model-driven automation platforms. These platforms base their automation functionality on service instance models that provide centralized representations of all services under management.

You may wonder what is so novel about this? Don’t we already have a number of tools that do exactly this? For example, isn’t this what Kubernetes does? Kubernetes keeps track of all deployed pods, the containers in those pods, and the scaling of those containers. Is there a need for anything more?

The answer—as is often the case—is it depends. While Kubernetes is great for automating lifecycle management of container-based deployments, not all services fit nicely into the Kubernetes paradigm.

Consider, for example, a fairly common Edge Computing use case. Edge Computing typically involves rather complex application topologies where some application components are installed on edge devices, other components are hosted in the cloud, and networks need to be provisioned to interconnect these components. The cloud components might be packaged as virtual machine images that need to be deployed on OpenStack of AWS clouds, or they might be constructed as cloud-native applications that are deployed using container systems such as Docker. Network connectivity might be provided by establishing secure tunnels over the public internet (e.g. using SD-WAN technology) or by special-purpose networks provided by network operators. As a result, Edge Computing invariably deals with extremely heterogeneous infrastructure environments on top of which applications need to be deployed.

In addition, Edge Computing application topologies tend to be much more dynamic and unpredictable than pure cloud-based applications. Edge devices can vary widely in how much compute power, memory, or storage is provided, which means that application components may need to adapt to the devices on which they are deployed. Devices may be mobile and can move, in which case application workloads may need to adapt to varying network conditions, and workloads may need to be moved dynamically from the cloud to the edge to satisfy latency or interactivity requirements.

It should be clear that such scenarios cannot easily be handled by Kubernetes alone, since Pods and Containers offer no support for creating network tunnels or for deploying EC2 instances on AWS. Containers may also not be the best technology for performance-sensitive data plane applications running on low-end edge devices.

What is needed instead is an automation platform that can manage services across multiple application domains. Such an automation platform must not be tied to specific infrastructure technologies or to domain specific deployment paradigms.

What might such domain-independent automation platform look like? To answer this question, let’s think about what makes an automation platform domain specific. The answer, as might be clear from our previous discussion about model-based automation platforms, is the platform’s meta-model. Key to every model-driven automation platform is a meta-model that defines the abstractions that can be used to create and manage instance models for the services managed by the platform. In the case of Kubernetes, the meta model includes Pods and Containers as first-class abstractions. This makes the Kubernetes meta-model hard to use for automating services that do not use Containers and are not organized in Pods.

The key to building a domain-independent automation platform, then, is to define a meta-model that is not tied to specific infrastructure domains or to specific deployment paradigms. At the same time, this meta model must be sufficiently expressive to describe service lifecycle management functionality in a general-purpose fashion, which would allow it to cover a broad variety of application domains. With a proper meta-model, we can build domain-independent automation platforms that can be used for end-to-end orchestration of the Edge Computing use case described earlier. I will investigate later what a feature set might look like for such a meta model.

Saturday, July 18, 2020

What is Model-Driven Automation

In a previous post, I suggested that Model-Driven Automation is a superior automation paradigm for reducing the operational expenses associated with managing the lifecycles of software services. Model driven automation also promises to finally deliver the type of service agility we have been expecting from cloud-based deployments. In this post, I will dig a little deeper into what model-driven automation really entails.

I believe there are three fundamental aspects to a model-driven automation system:

The instance models
The automation approach
A meta model

I’ll talk about each of these in a little bit more detail.

Instance Models

At the core of any model-driven automation tool is a full representation—in the automation system database—of all deployed service instances and their components. We refer to such a representation as a service instance model.

Note that service instance models are not the actual services themselves. They are representations—or models—of the actual services that are created using resources that are external to the automation system (and that are typically also represented using models in the automation system database). The instance models represent those aspects of a service that are relevant for the purpose of automating service lifecycle management. Service instance models typically contain all the components that make up a service as well as the resources on which these components are deployed. They track configuration values for each component, and they contain status values that represent runtime state for service components. Most importantly, instance models capture relationships that represent dependencies between service components.

Automation Approach

Fundamental to any model-based automation tool is the concept that all service management tasks must be performed by operating on the service instance model first, and then propagating any resulting changes in the model to the external resources or service entities under management.

Similarly, a model-based automation system is responsible for monitoring the state of the external resources or entities and reflecting any changes into the corresponding instance model. These changes may then necessitate actions to be taken by the automation system (e.g. to respond to a failure in an autonomous fashion), which in turn could result in changes to other components in the instance model, which then need to be propagated to the outside world.

Using this approach, the instance model becomes the single source of truth for all information about the services under management. When an automation system needs access to configuration or status values for service components, it can get those values from the instance model rather than having to query the actual external entities or resources themselves.

Meta-Model

This brings us to the most important aspect of a model-driven automation system, which is the meta-model, or the model that governs the instance models themselves. In general, meta models define the rules, the constraints, and the theories for how to create models for a specific application domain. For our domain (which is service automation), the meta-model must define rules and semantics for the creation and manipulation of service instance models, and all instance models must conform to this meta model. The meta model specifies what type of information must be in the instance models and where that information can be found. The meta model also defines the rules and semantics for how the various aspects of the instance models can be changed by the automation system.

Examples of the type of functionality that might be exposed by the meta model:

Definition of the exact set of configuration values that are required for each specific service component, where in the model these values are stored, any constraints with which these values have to comply, and what the run-time state variables are that must be tracked for a specific service component.
Definition of relationships that capture how changes in one component may have an effect on other components in the same service (or even on other services that share the same resources).
Mechanisms for expressing resource requirements for each service component.
Mechanisms for supporting decomposition of services (such as a decomposition of user-facing services into resource-facing services).
Hooks for plugging-in component-specific mechanisms for propagating changes to the model of a component into configuration changes in the external world.
Hooks for plugging in monitoring functionality that can reflect changes to the external state of a service into corresponding values in the model components.
Event handling logic or policy logic that expresses how changes to component values need to be handled by the automation system.

It is the presence of a meta-model that allows model-driven automation systems to provide automation functionality in a general-purpose fashion without having to rely on domain-specific assumptions or special-purpose automation logic. With a proper meta model we can create service automation systems that are domain-independent which then allows the same automation system to be used across a wide variety of application domains.

Tuesday, July 14, 2020

Is it Time for a Different Automation Paradigm?

The Information Technology landscape today looks quite a bit different than it did ten years ago. Over the last decade, we have witnessed almost universal adoption of virtualization. Virtualization in turn has led to cloud-based service deployments, and it has also driven a move away from fixed-function appliances in favor of software features deployed on general-purpose hardware (e.g. Network Functions Virtualization). More recently, container technologies have enabled new micro-services-based architectures. The latest trend seems to be a move towards Functions as a Service, which aims to further decouple operational considerations from application functionality.

These changes have come about largely as the result of two main business drivers:

Cost Reduction: virtualization allows for server consolidation, which offers the promise of significant reductions in capital expenditures (Capex). Cloud deployments further reduce Capex by replacing fixed infrastructure costs with monthly service charges.
Increased Agility: cloud-based deployments allow infrastructure to be spun up instantaneously on-demand. This enables “as-a-service” access to infrastructure and applications, and it also supports dynamic scaling of service components in response to changing load requirements.

To make this agility possible, automation tools had to be developed that allow software control over infrastructure components. These tools are generally categorized under the Infrastructure-as-Code umbrella because they apply software development approaches to infrastructure deployments.

One should ask then, how well all of this has played out so far. Have these new technologies lived up to their expectations?

Ironically, it appears that in many cases, adoption of these new technologies has had the opposite effect than what was originally intended:

Increased Cost: Whereas the desired Capex reductions may have been achieved, an increase in operational expenses (Opex) associated with managing virtualized services has in many cases more than offset any reductions in Capex.
Reduced Agility: while automation may have made initial service deployments easier and less error prone, it has often proven challenging to make changes to running services without breaking things. Service providers have come to adopt the motto that “if it works, don’t touch it”, which all but eliminates any illusion of agility.

So, if things did not work out as expected, one could ask where they went wrong. It turns out that in hindsight, these outcomes are actually not that surprising. Virtualization, cloud, and container technologies all introduce new management requirements, which increases operational expenses. In addition, modern service architectures typically involve a larger number of smaller components, which complicates lifecycle management of these services. The number of possible component combinations grows exponentially, and the number of failure scenarios increases dramatically as well, since many of the service components can fail independently. As a result, managing modern services is significantly more complicated than managing “traditional” services.

To compound the problem, administrators who are tasked with managing these services often do not have the necessary visibility into all the components that make up a service. It might be hard to find out where components are deployed, and even harder how components interact. Administrators often have no idea that a change to one service component may have unintended consequences somewhere else. Not only does this make it hard to troubleshoot problems, it also makes it near impossible to perform the moves, adds, changes, and deletions (MACDs) that are required to manage running services.

Current automation tools such as Ansible and Terraform or other “infrastructure as code” tools do not help the situation much. While they do a good job of defining the necessary steps to get a service deployed, they do not provide a clear record of the end result. How does one develop automation to modify a running service without an accurate view of what is deployed? Even worse, how do you create automated responses to failures without any context within which to process failure events?

If automation is supposed to be the solution that reduces Opex and enables agile MACDs, a new automation approach may be necessary. Such an approach must start by addressing the main shortcoming of “infrastructure as code” tools, which is lack of visibility into deployed service components and their associated states. Without improved visibility, it is impossible to tackle the complexities of current service architectures.

We propose model-driven automation as an alternative automation paradigm that can address these shortcomings. At the core of a model-driven automation tool is a full representation (in the automation system database) of all deployed service instances and their components. We will refer to such a representation as a service instance model. Service instance models provide full visibility into all deployed services, and all service automation logic is built around these service instance models. Not only does a service instance model contain representations for all service components, it also keeps track of the resources on which these components are deployed, and most importantly it captures relationships that represent dependencies between service components. These relationships are crucial for providing visibility into how changes to one service component can affect other components in the same service (or even other services that share the same resources).

By tracking service instance models, model-driven automation tools provide a path towards solving the Opex and agility challenges associated with current service architectures:

Instance models provide the necessary visibility into runtime state that can serve as a starting point for automating moves, adds, changes, and deletions.
Instance models also provide the necessary context within which to handle external service events. By using state represented in the instance model, the event handling logic itself can be stateless, which allows for modular and scalable failure handling automation.

Without instance models, I am afraid that automation tools may have reached their limit in how well they can keep pace with evolving service architectures. I believe it is time to adopt model-driven automation as a different paradigm that can take service automation to the next level.

Tuesday, June 16, 2020

TOSCA Application Domains

As its name suggests, TOSCA—the Topology and Orchestration Specification for Cloud Applications—has its origins in Infrastructure-as-a-Service clouds. What set TOSCA apart from other early cloud technologies was its focus on applications. While other cloud technologies focused narrowly on orchestrating cloud resources—compute, networking, and storage—TOSCA distinguished itself by focusing on services and applications first, and orchestrating cloud resources only in support of those services and application. TOSCA includes language constructs such as requirements, capabilities, and substitution that allow service designers to specify resource requirements in their templates without having to explicitly prescribe the exact type of resources to be used, or how and where these resources are expected to be allocated. This creates a loose coupling between application services and the required infrastructure resources, which then allows for the introduction of new resource categories as new deployment paradigms emerge.

This flexibility has allowed TOSCA to withstand the test of time and adapt to a changing IT and cloud infrastructure landscape. You might be surprised to learn about the broad spectrum of application domains for which TOSCA is being used today:

Cloud Services: This is the original use case that is still going strong today. TOSCA is used to deploy software applications on IaaS clouds such as OpenStack, Amazon Web Services, Azure, and others.
Network Functions Virtualization: ETSI has adopted TOSCA as the standard for defining and packaging Virtual Network Functions (VNFs) and for defining network services comprised of these VNFs. A TOSCA orchestrator (such as Ubicity) can be used as either the VNFM, the NVFO, or both in the ETSI NFV architecture.
Software Defined Networking: A number of large operators are using TOSCA for deploying and managing their software-defined networking services, and specifically their SD-WAN offerings.
Containers: TOSCA is gaining a lot of traction as a cloud-native orchestration technology. TOSCA can be used to deploy all components of a Kubernetes stack, starting with the Kubernetes clusters themselves as well as the container-based software applications that run on those clusters. In fact, TOSCA can be used as a superior alternative to other cloud-native packaging technologies such as Helm.
Serverless Computing and Functions-as-a-Service: Serverless computing allows software designers to define abstract software functions without deployment or operational considerations. This is a natural application domain for TOSCA: substitution mapping enables the creation of abstract services that do not define any deployment or operational consideration whatsoever. TOSCA orchestrators can handle the mapping of functions to available infrastructure automatically behind the scenes.
Edge Computing: TOSCA can be used to automate the deployment of software features to devices at the customer edge. Orchestrators can make deployment decisions based on latency or response time thresholds specified using TOSCA requirements.
IoT: This application domain is similar to Edge Computing but introduces additional complexities with respect to the special-purpose compute and networking technologies used by sensors and controllers. Technology specifics can be expressed using TOSCA requirements and capabilities.
Process automation: Use TOSCA to support open and interoperable process control architectures.

Most importantly, TOSCA is designed to handle more complex scenarios that combine two or more of these application domains. For example, Edge Services may combine components that are installed on edge devices with components that are hosted in the cloud. Networks may need to be provisioned to interconnect these components. Cloud components might be packaged as virtual machine images that need to be deployed on OpenStack of AWS clouds, or they might be constructed as cloud-native micro-services that are deployed using container management systems such as Kubernetes. Network connectivity might be provided by establishing secure tunnels over the public internet or by special-purpose networks provided by network operators. Because TOSCA is a domain-independent language, it is able to seamlessly handle such complex scenarios.

Wednesday, January 22, 2020

Maturing TOSCA

TOSCA is growing up fast. As cloud adoption is gaining steam, TOSCA is maturing in lockstep to fully support the evolving cloud landscape. This write-up provides a glimpse into how OASIS is streamlining TOSCA Version 2.0 to allow modern cloud technologies to take full advantage of TOSCA's unique capabilities:

Let’s start with a bit of history. Early adopters of TOSCA will remember that TOSCA was conceived as a technology-independent cloud orchestration language with the goal of providing the following unique benefits:

End-to-end service descriptions: whereas other orchestration languages focus solely on deploying infrastructure or solely on deploying application software, TOSCA is unique in supporting service descriptions for the entire stack (application components as well as infrastructure), all in one place!
Service-focused meta-model: the TOSCA meta-model is fully aligned with its mission as a service orchestration language. The meta-model introduces service topologies as first-class abstractions, where service topologies are modeled as graphs that contain the components (“nodes”) that make up a service as well as the relationships between these components. This graph-based meta model is the foundation for many of TOSCA’s powerful and unique orchestration capabilities.
Reusable components: the TOSCA meta model also includes support for defining node and relationship types, which allows for the creation of reusable components from which complex services can be built.
Technology-independence: TOSCA was designed to be independent of any specific orchestration platforms our cloud technologies. Since most organizations deploy services across multiple clouds, TOSCA can be used as the common orchestration language for all clouds.
Portability: TOSCA supports the creation of portable services by using service descriptions that include abstract service components. These abstract components can be decomposed at orchestration time into technology-specific service topologies using the TOSCA substitution mappings feature.
Explicit resource requirements: TOSCA includes concepts such as requirements and capabilities which are used to allow service designers to explicitly encode qualitative and quantitative resource requirements into their service descriptions. This further establishes TOSCA as the language of choice for expressing all service-related information in one single place.

The first versions of TOSCA were naturally focused on the prevailing cloud paradigm at the time, which was Infrastructure as a Service (IaaS), and specifically OpenStack and AWS. To support this paradigm, the TOSCA standard defines normative types for orchestrating compute, storage, and networking infrastructure as well as for deploying software components on top of this infrastructure.

Over the last several years, the cloud landscape has expanded significantly. SDN and NFV have emerged as virtualization technologies in the telecommunication space that enable a variety of networking-on-demand scenarios outside of the data center. Cloud-native software development based on a micro-services paradigm has become mainstream, and container-based software deployment has become the norm. Server-less technologies intend to further decouple software functionality from specific deployment mechanisms.

What’s exciting here is that in many ways, these new cloud technologies are actually much better targets for TOSCA’s unique capabilities than the original IaaS clouds were, for the following reasons:

Whereas VMs are monolithic, micro services (and associated service meshes) are built using granular components with complex inter-dependencies. TOSCA service topology graphs are a natural choice for modeling micro-services.
Furthermore, the server-less paradigm aligns perfectly with TOSCA’s mission. Consider the following definition for server-less computing as used by IBM: “Server less is event-driven programming using stand-alone functions with no deployment or operational considerations”. This paradigm fits perfectly with TOSCA’s abstract service descriptions where abstract components do not carry information about specific deployments. Implementation-specific deployment and operational considerations could be mapped to abstract service descriptions using TOSCA substitution mappings if necessary.

Because of these advantages, we’re starting to see increased use of TOSCA for cloud-native services. At the same time, this trend has made it clear that there are some areas in which the TOSCA standard might need to adapt:

If the TOSCA standard is to be used for a variety of cloud paradigms, it must not be tied to a set of types that implement only one specific cloud. We’re investigating decoupling the normative types from the language specification to allow the TOSCA language and the normative types to evolve and be extended independently.
In some areas, assumptions about the underlying cloud paradigm have bled over into the TOSCA language itself. We’ll identify and remove these dependencies with the goal of making TOSCA a general-purpose cloud service lifecycle management language.
Assumptions about the cloud paradigm may also have resulted in the language being under-specified. For example, some cloud orchestration functionality cannot currently be fully expressed using the TOSCA language based on the assumption in early versions of the specification that orchestrators must have built-in knowledge about how to interact with IaaS clouds. Additional language constructs are being added to the TOSCA language to eliminate these assumptions and to fill gaps in functionality where needed.

It's an exciting time to be part of the TOSCA ecosystem. If you’re interested in participating, please drop us a note.