Any subject worth learning is like an onion. No, not because it will make you cry (although some sure bring out the tears for me). Rather, it's because a worthwhile subject consists of multiple layers, and every time you think you completely understand the whole thing, you find something new to capture your time and attention.
In my article last week, Peeling Back the SIP Resiliency Layers, I discussed a number of techniques that an enterprise can employ to create a resilient SIP infrastructure. As with the onion, there are layers of subsystem resiliency that all together make an entire system durable and robust. Take one away and you risk a single point of failure that might be responsible for total system failure.
Today, I would like to peel the onion back a bit more and discuss one of the most critical aspects of SIP resiliency: high availability session border controllers (HA SBC). While the overall concept of an HA SBC is fairly obvious, it's important to understand how it's enabled and more importantly, under what conditions failover rules are invoked.
You may be surprised to know that there really is no such thing as a high availability SBC. In reality, a high availability SBC is made up of two standalone SBCs with a private connection between the two. While vendors may create product structures that position two separate SBCs as one logical package, you are still buying two boxes. It takes software and a pseudo network between them to turn them into an HA pair.
In all cases, one of the SBCs will be designated as active, and the second SBC as standby. That doesn't mean, however, that the standby SBC is just sitting around twiddling its thumbs. No, it's listening to all sorts of device health, configuration, and call state information that the active SBC is sending across the pseudo network. This allows the standby SBC to know exactly what the active SBC is doing in case it has to take control.
Since an HA SBC is really two separate SBCs, something more is needed to make it look like only one to the rest of the world. This is accomplished by creating virtual MAC and IP addresses that can freely float between the two standalone SBCs. These are the only SBC MAC and IP addresses that the network is aware of. Only one SBC at a time will assume control of these virtual addresses, so to the network, it really looks like one device. It works like this:
It's important to know under what conditions the standby determines it's time to step up to the plate and assume the role of active. These conditions may vary by vendor, but here are the main triggers:
Note that losing physical connectivity is not the same as losing logical connectivity. By this, I mean the difference between a broken cable and an unresponsive call processing server. In the case of logical connectivity failures, different routes can be taken rather than failing over to another SBC that will have the exact same connectivity problems. Failover from one SBC to another will not fix an unresponsive SIP carrier.
As important as it is to know when to failover from active to standby, it's just as important to know when not to failover. You don't want the standby SBC jumping the gun and unnecessarily taking control when the active SBC is just a tad slow to respond. For this, SBCs have timers to determine when a problem really is a problem.
In addition to handling unexpected runtime problems, being highly available allows SBCs to be upgraded without stopping SIP traffic. The steps for this are:
As this point, you can leave things running as they are, or failback to the way it was at the start of the upgrade process. The beauty of this method is that a significant upgrade can be completely unnoticed by the outside world.
While every SBC on the market pretty much supports high availability as I described above, there is plenty of room for vendors to differentiate their products from the competition. I spoke with my friends at AudioCodes and learned about some of the features it considers unique to its SBCs:
My friends at Sonus stressed the depth and flexibility of its HA solution (e.g. copper or fiber for the synchronization link) while touting how its disaster recovery licensing saves an enterprise money when deploying SIP trunks at separate data centers.
Remember the connection I spoke of between the active and standby SBC? Known by some vendors as the synchronization link, it's essentially one or two Ethernet cables that directly connect the active SBC with the standby SBC. Depending on the vendor, it may be a straight or a crossover cable.
It's absolutely essential to know that this is a Layer 2 connection. This means that the SBCs must be on the same subnet. While nearly every telecom director I speak with wishes that he or she could spread the active and standby SBCs across data centers, high availability is limited to two SBCs in very close physical proximity.
I hope this article helped make a somewhat complicated and slightly mysterious subject easier to understand. A little knowledge applied in the right way will save money while avoiding costly downtime.
Andrew Prokop writes about all things unified communications on his popular blog, SIP Adventures.