IT Risk Mitigation Examined

Prevention, response, and investigation are three key approaches that, when used together, can assist IT in deciding whether to accept or mitigate the level of risk.

This same risk management principle relates to engineering economics, in assessing whether the cost of a preventative solution outweighs its benefits. Risk management seems black and white, but in practice cost becomes a factor.

Oftentimes, senior management does not fully share the cost of the risk -- as related to disruption or downtime -- to IT. This opens the potential for IT and other operational areas to over-engineer solutions, which in turn leads either to a high total cost of ownership that outweighs the benefit or to a solution that fails to work as anticipated due to complexities or change.

Another key metric lies within carrier and provider SLAs. Do these levels of commitment complement or oppose each other? By that I mean, if the cloud provider states availability at 0.99% and the carrier is at 0.995%, then you have an imbalance -- and the potential to over- or under-engineer your solution. The point is, you need to know your exact SLAs with both carrier and cloud providers, at least when it comes to access/availability. The cloud provider with 0.99% availability can be down for 7.2 hours while your carrier committing to a 0.995% availability is only down for 3.6 hours in a 30-day month, for example.

In practice, we see over- and under-engineering daily, across disciplines and in all industries. Workstations have more than enough hard-drive capacity but lack in RAM, or firewalls don't account for simultaneous connections, BYOD, or the Internet of Things. These types of risks are preventable and, when overlooked, can be effectively remediated when discovered during investigation into root cause of failure or disruption.

Case in Point

The response to an IT failure is accepting the risk or mitigating it to eliminate or minimize impact. I'll use power disruption as an example, since IT needs continuous and undisturbed power to provide availability to services.

While investigating why IP phones were failing intermittently at a county agency, I found that all IDF and MDF LAN switches and routers had redundant power supplies. Power supply #1 connected to the UPS source, and the UPS connected to an orange receptacle served by a standby generator. Power supply #2 connected directly to the house receptacle, bypassing the UPS.

The thinking was that if the UPS failed, then the secondary power supply would already be active and so there'd be no downtime during the repair or while awaiting replacement. However, power supply #2 had no surge protection -- which I uncovered. Power disruptions and disturbances to the unprotected house power impacted the LAN switch. The agency implemented its power supply configuration for the right reason, but it did not account for the power disturbances and transients that occur daily in every installation on unprotected power.

Hopefully the IT response to this situation isn't to add a secondary UPS to every IDF and MDF. Instead adding whole panel surge protection devices (SPD) is a more cost-effective solution with a lower TCO.

While IT staff often takes for granted doing the right things to prevent downtime, not every situation is the same and thinking through the solution is important. In the example I used, the solution added another layer of risk -- a risk that came to fruition, causing disruption to the LAN switches. Instead of providing less risk, it increased vulnerability.

Thinking Big Picture

As a sidebar, consider the potential impacts -- good and bad -- of using a carrier data centers for both transport and cloud services. What does the SLA look like for enterprises using one source, what are the risks, and what are the planned responses during an outage? What kind of preventive measures have you employed and do they work? How do you know?

Follow Matt Brunk on Twitter!
@telecomworx

Tags:

News & Views

Enterprise Networking

Enterprise Connect

Articles You Might Like

Why Don’t Enterprises Believe Telcos on Optical Networking?

Tom Nolle

October 02, 2023

According to recent research, telcos haven't given enterprise customers any reason to be optimistic about technological innovations done in a timely fashion, or competitive pricing in the market.

Beware the Network Security Breaches Caused by Carelessness

Tom Nolle

March 24, 2023

Overexposure, overpermission and overdistribution all present threats to an enterprise's security – but there are ways to offset all three of these security issues.

ISP Channel Service Units – Are They A Good Thing

Sorell Slaymaker

February 08, 2023

Every technology/product has its time and place – and as Network as a Service (NaaS) takes off, the CSU's time might be coming to a close

Your WAN: The Overlooked and Vital Link to the Cloud

Cheryl O'Brien

February 02, 2023

The WAN is the most important link in this whole chain of dependency on the cloud, as the WAN is the weakest link. Therefore, 'X' As A Service is only as good as the ability to get to X.

Search form