Attaining SIP Resiliency through HA SBCs

Any subject worth learning is like an onion. No, not because it will make you cry (although some sure bring out the tears for me). Rather, it's because a worthwhile subject consists of multiple layers, and every time you think you completely understand the whole thing, you find something new to capture your time and attention.

In my article last week, Peeling Back the SIP Resiliency Layers, I discussed a number of techniques that an enterprise can employ to create a resilient SIP infrastructure. As with the onion, there are layers of subsystem resiliency that all together make an entire system durable and robust. Take one away and you risk a single point of failure that might be responsible for total system failure.

Today, I would like to peel the onion back a bit more and discuss one of the most critical aspects of SIP resiliency: high availability session border controllers (HA SBC). While the overall concept of an HA SBC is fairly obvious, it's important to understand how it's enabled and more importantly, under what conditions failover rules are invoked.

You may be surprised to know that there really is no such thing as a high availability SBC. In reality, a high availability SBC is made up of two standalone SBCs with a private connection between the two. While vendors may create product structures that position two separate SBCs as one logical package, you are still buying two boxes. It takes software and a pseudo network between them to turn them into an HA pair.

In all cases, one of the SBCs will be designated as active, and the second SBC as standby. That doesn't mean, however, that the standby SBC is just sitting around twiddling its thumbs. No, it's listening to all sorts of device health, configuration, and call state information that the active SBC is sending across the pseudo network. This allows the standby SBC to know exactly what the active SBC is doing in case it has to take control.

Since an HA SBC is really two separate SBCs, something more is needed to make it look like only one to the rest of the world. This is accomplished by creating virtual MAC and IP addresses that can freely float between the two standalone SBCs. These are the only SBC MAC and IP addresses that the network is aware of. Only one SBC at a time will assume control of these virtual addresses, so to the network, it really looks like one device. It works like this:

It's important to know under what conditions the standby determines it's time to step up to the plate and assume the role of active. These conditions may vary by vendor, but here are the main triggers:

Note that losing physical connectivity is not the same as losing logical connectivity. By this, I mean the difference between a broken cable and an unresponsive call processing server. In the case of logical connectivity failures, different routes can be taken rather than failing over to another SBC that will have the exact same connectivity problems. Failover from one SBC to another will not fix an unresponsive SIP carrier.

As important as it is to know when to failover from active to standby, it's just as important to know when not to failover. You don't want the standby SBC jumping the gun and unnecessarily taking control when the active SBC is just a tad slow to respond. For this, SBCs have timers to determine when a problem really is a problem.

In addition to handling unexpected runtime problems, being highly available allows SBCs to be upgraded without stopping SIP traffic. The steps for this are:

As this point, you can leave things running as they are, or failback to the way it was at the start of the upgrade process. The beauty of this method is that a significant upgrade can be completely unnoticed by the outside world.

While every SBC on the market pretty much supports high availability as I described above, there is plenty of room for vendors to differentiate their products from the competition. I spoke with my friends at AudioCodes and learned about some of the features it considers unique to its SBCs:

My friends at Sonus stressed the depth and flexibility of its HA solution (e.g. copper or fiber for the synchronization link) while touting how its disaster recovery licensing saves an enterprise money when deploying SIP trunks at separate data centers.

Remember the connection I spoke of between the active and standby SBC? Known by some vendors as the synchronization link, it's essentially one or two Ethernet cables that directly connect the active SBC with the standby SBC. Depending on the vendor, it may be a straight or a crossover cable.

It's absolutely essential to know that this is a Layer 2 connection. This means that the SBCs must be on the same subnet. While nearly every telecom director I speak with wishes that he or she could spread the active and standby SBCs across data centers, high availability is limited to two SBCs in very close physical proximity.

I hope this article helped make a somewhat complicated and slightly mysterious subject easier to understand. A little knowledge applied in the right way will save money while avoiding costly downtime.

Andrew Prokop writes about all things unified communications on his popular blog, SIP Adventures.

Follow Andrew Prokop on Twitter and LinkedIn!
@ajprokop
Andrew Prokop on LinkedIn

Tags:

News & Views

Monitoring & Management

Unified Communications & Collaboration

Enterprise Connect

Articles You Might Like

Why Network and IT Management are Failing

Tom Nolle

November 09, 2023

Nobody’s happy with how their network and IT management is handled. Application performance monitoring (APM) or “observability” could be the key to everything

Net Neutrality Is Coming Back (Soon)

Martha Buyer

October 23, 2023

The FCC’s plan to move forward with re-establishing policies that regulate broadband access see a revival of the idea that the Internet is a utility.

Getting Productivity Off the Carpet and Into the Dirt

Tom Nolle

May 09, 2023

To boost job performance outside a desk job environment, would-be technological optimizers must examine the context of each task in the whole workflow.

First-Person Assessment: The EPIK Edge Appliance

Darin Ward

November 03, 2022

If you're looking for a POTS alternative solution, here's the experience with one product.

Search form