What a Virtual Network Looks Like: Management
The challenge of virtual network management is you have real stuff that has to be managed and can’t be shown to the user, and stuff the user manages that isn’t real.
Virtualization, as we saw in the first article in this series, creates a kind of "service triangle." One apex is the services that are used/purchased, another is the features/functions that create these services, and the third is the resources that support those features/functions. In traditional networking, all three of these are linked into one by device-specific network services. In virtual networking, all three have totally dynamic relationships, and that's what makes management different.
All management starts at the top, so let's start there, with the service user. The user can really only have two possible views of the service: They can see it as a feature collection, which means they see the SLA, or they can see it as a "virtual device." A VPN "looks" like a single huge router, or like a contract, and so a virtual network also looks like one or the other.
It isn't of course, so whatever happens down inside the infrastructure can only be related to the user in terms of SLA parameters or the management features of a virtual device. The user doesn't know, or want to know, about hosting points and middleware and SDN controllers and OpenFlow; none of these are part of their services today. Likely, the service provider doesn't want users meddling down in the real infrastructure either.
That poses the challenge of virtual networking in a management sense; you have real stuff that has to be managed and can't be shown to the user, and stuff the user manages that isn't real. Translating one to the other is perhaps the most important requirement of all for virtual networks.Management Models
There are two approaches in the market today, and they differ in terms of how the real infrastructure is managed.
The first is the resource management model. This model says that a network is hosted on a collection of shared ("multi-tenant" in cloud terms) resources. If you have an expected amount of services delivered, you can size your resources to support that service quantity at the terms you're selling under. Capacity planning is as old as networking, after all. What you deliver to the user is simply a report of SLA parameter performance, which will be in range if your capacity plan is good and your resources are working. There's no direct correlation needed.
With the resource management model, you ensure you don't oversell your capacity. And if you don't oversell your capacity, then you only need to address the state of the resources, because if they're all working your services have to be working within their SLAs. The Internet is managed this way, and so is most cellular service, so we're all used to the notion. The problem it presents is that if, somehow, you mess up and the capacity doesn't cover the service load, there's often no way to tell what specifically went wrong. A big bank once described the resource-management model as being "on-the-average" service selling because your service would meet your SLA on the average.
The alternative model to address this desire for performance specificity is binding management. A resource or set of resources is bound to a feature/function within a user's service. If that feature/function isn't operating properly then the bound resource has a problem. If that bound resource has a problem then it will impact the features, functions, services, and users bound to it.
The value of binding management is clear, but that doesn't mean it's a good idea. The problem is that the association of resource conditions upward to features and services is difficult when the resources aren't what the user would expect, and it can also be very resource-intensive. Suppose you have a VPN, which is made up of routers (you think) except now you have servers, virtual machines, virtual switches, data center switches, and software elements instead of those routers. When a server fails, when a vSwitch gets congested, what do you tell the user? You have to translate resource conditions into feature and service conditions, and you have to do that for every binding you manage. And you have to do it for every VPN.
If neither resource nor binding management is the right answer, it's not hard to guess that some combination is at least the best answer. The best capacity plans still occasionally go astray. Even if services based on virtual functions can be replaced automatically if a feature or server fails, and even if the number of software instances of a firewall feature can be expanded and contracted with load, there will still be conditions where automatic remediation won't work. This means some limited binding management is still needed to augment your management of the resources.
Analytics seems to be a useful approach to implementing a management view that can accommodate both these models and everything in between. If you assume that all the real management data is collected in some repository, then that data can be extracted to correlate with the bindings, and so create an explicit management connection for every resource you've assigned -- a tight SLA. You can also simply look for SLA data and extract it for reporting, or combine reports of SLA problems with some correlations to resource conditions.
This is a good model because it doesn't dictate market needs, and if agile services are a goal of the network of the future, those needs could change almost daily. Management, in a virtual network, has to be virtualized too!
Follow Tom Nolle on Google+!
Tom Nolle on Google+