I occasionally hear from different sources that QoS is not needed in the LAN. A variety of rationales are used, but I think the main factor is the complexity of configuring QoS. I thought that it would be interesting to take a close look at a number of excuses for not using QoS.
We Use High Speed Links
Some people think that bandwidth is a good replacement for QoS. Their statement is often something like this:
QoS configuration is pretty complex, so we just made sure that we have at least X-bandwidth links everywhere. [Replace X with the speed du jour.] With high-speed links, we don't need QoS.
The presumption is that with sufficient bandwidth, congestion will never occur. In practice, this is not the case. Applications are consuming increasingly more network bandwidth. The addition of video content, data-intensive applications, and graphics is increasing the volume of network traffic.
Possibly the worst example that I've seen of this excuse is where the network staff decided that all edge ports should be configured for 10 Mbps Half Duplex! Their intent was to limit the data volume at the edge–somewhat like admission control. It definitely limited the data volume--however, everyone had poor application performance, which didn't help the organization compete in the business world.
Instantaneous Buffer Congestion The high speed link argument doesn't work because of something called instantaneous buffer congestion (also called instantaneous interface congestion). Interface buffers quickly fill due to the bursty nature of IP network traffic. Modern workstations and servers can easily fill a 1-Gbps link, and multiple flows on a common network infrastructure can quickly congest 10-Gbps links. It is only when TCP detects packet loss that it reduces its data rate. A link that is transporting multiple flows can quickly become congested, causing packet loss. QoS is needed to handle the prioritization of packets at the congested interface.
The NMS Doesn't Show Congestion
The next excuse is about measuring congestion:
Our NMS shows low link utilization, so there can't be any congestion.
The problem is that NMS polling happens at such a slow rate that the instantaneous peaks aren't visible in the performance data. Reducing the polling period from 10 minutes to 1 minute will show more detail, but still not enough to show instantaneous buffer congestion. We've seen serious congestion on a 1-Gbps link that the NMS reported as being 30%-40% utilized (at a 10-minute polling interval).
To detect congestion, you should be using the network monitoring system to report interface drops. A drop will occur when the interface buffers are full.
Cisco products typically have a small number of buffers allocated, often 64 buffers. When 64 packets are queued on the interface, the buffer queue is full and any successive packets will be dropped until at least one packet is transmitted out of the buffer.
Looking for excessive interface drops will allow you to identify interface congestion even though utilization figures show low utilization. Be careful though; TCP uses drops as part of its feedback to reduce speed (see the next section for more on this).
Look for interfaces that have thousands of drops per hour. A list of interfaces, sorted in descending order of drop count, is the best way to find congested interfaces. These are the interfaces that should be configured with QoS.
Increase Interface Buffers
At another customer, we heard this story:
We had drops on a key link, so we added interface buffers to reduce the number of drops.
The number of drops was indeed bad. However, increasing the number of buffers on the interface was not the right answer. The problem with this solution is that they added thousands of buffers. The latency due to buffering during data bursts became a major problem. Real-time applications were dropping late packets and TCP was retransmitting packets that it thought had been lost, but were in fact simply waiting in the buffers for transmission.
This was on a 1-Gbps link. The duplicate packets consume network bandwidth with no benefit. We knew that TCP was retransmitting packets because we found large numbers of duplicate TCP ACKs in our packet traces.
It is actually better to drop excess traffic than to buffer very many packets. The right solution is to increase the link bandwidth. However, the short-term solution was to reduce the number of buffers and use QoS to prioritize traffic.
Next page: The Data Center doesn't need QOS?
The Data Center Doesn't Need QoS
Some people subscribe to this notion:
The Data Center doesn't need QoS.
We have performed a number of network analysis cases where high-volume, low-priority traffic caused performance problems for more important applications. In several instances, the low-priority traffic was a network backup that congested a key link.
Data centers can also benefit from prioritization of internal traffic. Server-to-server traffic (so-called East-West traffic) has greatly expanded as multi-tiered applications proliferate. A modern application may be constructed from multiple tiers of servers. Some applications will be more important than others, and QoS can be used to make sure that the primary applications get the bandwidth that's needed to generate revenue for the company.
Figure 1. Typical Multi-Tiered Application
Some people also suggest that the data center traffic doesn't affect voice and video and therefore they don't need to do QoS. When we ask where the Multipoint Control Unit (MCU) is located, it is often found in the data center, typically near the call controller. There are then two traffic classes needed to support the voice/video system: one class for voice/video traffic and another class for signaling traffic. The signaling traffic queue guarantees that dial-tone is provided on an IP phone when transmission of a backup, for example, is congesting a key link. Our recommendation is to use the same priority levels that are used elsewhere, so that there is no confusion about the QoS policy at any point in the network.
But be careful with data center QoS. Packet classification and marking should be done at the edge, not in the core. The core network devices should use the QoS settings for forwarding the packets as efficiently as possible.
Summary
QoS is necessary where differentiation of traffic is necessary or desirable. I'll agree that QoS configuration and policy definition needs to be improved. A good configuration management system can help minimize the effort for initial configuration and to verify compliance over the long run.