Converged Networks: Pointing the Finger
if you find yourself pointing the finger at another party, point a probe in the right direction and take an assessment instead.
What do you do when you have a hosted provider managing resources at the data center, a carrier that does no wrong, a consultant that fails to deliver a plan that is even viable, and local vendors chasing rabbits while the customer seemingly wants to grab hold of the carnival pop gun to take pot shots at anything that moves in hopes of getting relief?
Recently, I had a long conversation with a senior engineer who is a router expert and he lives and breathes keeping networks up and available. He suggested I use some additional probes and as we worked them out I saw a pattern quickly develop that would ease tensions and even help those trying to resolve issues by pointing to at least the right direction of the issue. The other observation is that not only was this approach novel, it is effective and costs little time and effort to implement.
Point your probes, not your fingers!
For the sake of simplicity and brevity I'm only going to address pings as the tool for these probes, but there are other options. Probes are not limited to pings and there are various probes in numerous devices and platforms.
The probe from the router to the ISP gateway hardware will validate the physical link and hardware. This I thought was novel in determining if that port on a cable modem, another router, or IAD is defective. Instead of just relying on port statistics, the probe continuously tests and can also be set to an SNMP trap to forward alerts when the interface does fail.
The probe from the router to a public IP address such as 18.104.22.168 or 22.214.171.124 validates connectivity to the Internet. If the business is a 9-5 operation then resetting the probe(s) daily or setting the probes to business hours by using a schedule can help look at availability through the smaller window of time and required availability of that business.
The probes from the router to the data center public IP and then also to the data center firewall will isolate whether there's an issue in getting to the data center or getting out of the data center. The firewall probe can also cast light on any VPN or tunnel failures between the customer router and the data center.
A really useful idea is to use the probe from the router pointed from a user application that appears on user desktops as a shortcut to a destination in the data center that resolves from conditional forwarders setup in Microsoft servers. An example of the user's desktop shortcut URL is: remote.IloveProbes.com. The DNS server has conditional forwarders that point remote.IloveProbes.com to an IP that resides in the data center.
This probe will reveal whether or not the resources supporting the hosted applications in the data center are high availability or not. Of course you still need to exercise some discernment before making a judgment, and that is true for probes in general. In my illustration, all probes are equal to provide simplicity and correlation as to where issues reside.
Then, probes from the router to remote sites via GRE (Generic Routing Encapsulation) tunnels, VPNs or other interfaces will indicate availability and serve as an aid in resolving issues.
Probes from the router to the PBX and any related gear with an IP address also serve to keep an eye on availability and whether the path to/from the gear to the router is open.
Probes coupled together with your network diagram can be used to test availability of other network resources. In essence you can use these simple tools to gauge whether or not you need to call the plumber. Throw in your port and interface statistics and you have more ammunition that can help you keep those packets flowing.
So if you find yourself pointing the finger at another party, point a probe in the right direction and take an assessment instead. Probes aren't a guarantee but the results are good for collaboration as a minimal expectation and hopefully everyone involved looks in the right direction to resolve the root cause of the network problem.