Solving Internet Backbone Problems to Deliver on End-User Expectations

The Internet makes it possible for anybody to access Web applications, at the price of fair to mediocre performance for everybody. It's no surprise that productivity is directly linked to fast application response time, yet very few Web applications are able to deliver desired sub-second response times.

The culprit behind poor SaaS performance is often the Internet backbone, but the issues created by the Internet backbone are not widely known. It's helpful to understand what happens to data sent over the Internet between the time it leaves an origin server and is received by an end-user.

Early Web applications such as e-commerce sites were effectively static product catalogs. Today, most SaaS applications involve dynamic, personalized interactions and bi-directional content flows that are far from static, including an expanding number of offers around voice and video communications, which are fundamentally susceptible to packet loss and delay. The Internet has become the new LAN -- the primary network between a user and his or her applications -- and this puts tremendous pressure on legacy Internet protocols that were simply never designed to address today's application needs.

How the Internet Really Works

Nearly all Web browsing, unified communication and collaboration, photo and file sharing, and video and music streaming traffic flows through the Internet backbone, made up of many large network service providers that interconnect. These large networks charge Internet service providers (ISPs) to transport data packets long distances. Content providers rely on this connectivity, and this includes any business that serves up data over the Internet such as VoIP, video, pictures, online gaming, messaging, SaaS applications, or music. Collaborative Web applications that enable end users to provide dynamic content to each other also depend on Internet reliability.

In 1986, the U.S. National Science Foundation (NSF) established the first backbone network for the Internet, which provided up to a staggering 56 Kbps of throughput. The Internet today is a large collection of independent network providers that tap into the biggest backbones owned by companies such as AT&T, Verizon, Sprint, and CenturyLink. It consists of routers and switches, connected mainly by fiber optic cables, with each fiber link on the backbone normally providing 100 Gbps of bandwidth.

portable

Economics Create the Problem

The Internet is a collaborative business venture between many different network operators. The economics of the Internet backbone and the technology it depends upon create problems for SaaS providers looking to deliver a consistently fast customer experience. These Internet issues aren't mistakes, but design trade-offs that were made 30 years ago. SaaS providers are beholden to a global network for which no one is responsible, and across which routing decisions are made based on cost, not performance. The Internet is completely indifferent to the fact that a voice or video packet has latency and loss limits that other traffic does not. There is simply no such thing as Internet Class of Service.

Internet least cost routing follows the rules set used by Border Gateway Protocol (BGP), the routing protocol of the Internet backbone. BGP policies and routing rules are typically based on metrics that ensure connectivity and resilience, but also target cost control. Naturally, network providers will always send data along the lowest cost paths. But this means SaaS providers' customers suffer due to the network providers' cost-cutting routing measures.

Network providers universally use BGP to route traffic between themselves. When you visit a website, that website data may traverse networks all over the world through machines belonging to disparate companies and organizations. In order to ensure that data transmissions eventually get to their intended locations, routers keep a table of known trusted routes. Each router is part of an Autonomous System (AS) with its own Autonomous System Number (ASN). The relationship between IP addresses and ASNs is similar to the relationship between street addresses and zip codes, and the Internet uses ASNs to route traffic. Once a packet is delivered to the proper ASN the network pays attention to the IP address.

Within the BGP protocol, only the ASN associated with an IP address is used for routing. BGP rules dictate that each packet will be sent to the route with the shortest AS_PATH, or fewest number of ASN "hops." BGP protocol rules for moving traffic between networks (called EBGP) don't adapt to congestion. There is no feedback mechanism for an ISP to change a route based on actual traffic, so routing traffic into congested networks is a common occurrence.

SaaS performance also suffers from some functional aspects of TCP, which dominates the Internet. TCP is optimized for reliable, ordered, and error-checked delivery of data between two systems. It is a very conservative protocol optimized for accurate delivery at the expense of high throughput. As it encounters any congestion in the Internet it reacts strongly, shrinking packet sizes and limiting throughput to ensure packets can make it to their destination. This results in application performance slowing to a crawl, creating frustration for users and reduced productivity.

TCP data transfer algorithms are not designed to be efficient. TCP requires each chunk of data to be acknowledged by the receiver before the sender sends the next batch of data. Since these data chunks are typically small (around a thousand bytes), transferring even 1 MB of data can require hundreds of separate trips through the Internet backbone.