Cisco Malware Detection: What Communications Folks Need to Know

One of the most intriguing capabilities Cisco announced at its June Cisco Live conference is Encrypted Traffic Analytics (ETA), a solution that has the ability to examine encrypted data traffic and identify threats, like viruses and malware. The company claims ETA is 99+% accurate in detecting these menaces without decryption.

This post describes how ETA works, why it requires a new generation of switches, and why we in the communications industry should care.

Encrypted Traffic Is Growing

According to an April 2017 study sponsored by Thales e-Security, enterprise use of encrypted data flows, already on the rise, is expected to increase rapidly as companies roll out Internet of Things (IoT) programs and make encryption devices and policies the norm. Encrypting data brings a far greater sense of content security than allowing open data flows, but it turns out that hackers and malware makers are also incorporating the use of encrypted flows to make their threats much more difficult to detect.

Many organizations have responded to this increase in encrypted traffic by putting some sort of trusted "device in the middle" that decrypts traffic, does a deep packet inspection looking for threats, and then re-encrypts the data. While this method works, it isn't scalable in terms of investment and required compute power.

Cisco's ETA Approach for Malware Detection

With ETA, Cisco takes an alternative approach for examining encrypted traffic by examining patterns in malware-infected, but still encrypted, data flows. Many malware schemes create unique fingerprints or identifiable patterns while they are setting up the flows and as the flows progress. By training a machine learning algorithm using known patterns of infected encrypted data, ETA can detect malware even while the data flow is encrypted.

Two key elements establish a malware fingerprint in encrypted data: the initial data packet, and the sequence of packet lengths and times during a flow.

Many encrypted data flows use transport layer security (TLS) as the cryptographic protocol for providing security between two applications communicating over a network. The majority of TLS handshake messages are unencrypted, and Cisco switches in the flow path can gather this TLS handshaking information and use it as meta data. The initial packet offered by the device initiating the flow is very important because it provides a gold mine of TLS information while remaining unencrypted.

TLS handshaking for establishing a secure connection involves the following steps:

  1. Agree on the version of the TLS protocol to use
  2. Agree on the cryptographic algorithms to use
  3. Exchange and validate digital certificates
  4. Generate a shared secret key

Cisco also collects the sequence of packet lengths and times because they can serve as indicators of what's happening in an encrypted flow.

The figure above shows packet length (vertical lines) and arrival times (horizontal lines) for two different TLS sessions. On the left is a pattern for a typical Google search, while the image at the right is a session for the BestaFera trojan hackers used to collect a user's online banking data and send information to a control server. The red lines at the start represent unencrypted TLS packets while the gray lines are encrypted data flows.

The Google search at the left proceeds as expected. The user begins typing, and the browser sends an outbound packet to Google. Google immediately responds with a lot of packets containing possible auto-complete results based on its predictive algorithms using the typed letters or words. The small gray packets on top represent the user still typing as he/she completes entering the search terms. Google then sends updated results.

In the malware image on the right, the TLS handshake occurs, but the BestaFera server sends back a self-signed certificate (note it is still in red, unencrypted, so ETA can detect it). The virus then commands the user's device to begin sending a lot of data (Data Exfiltration), as shown in the upper gray lines. Finally, the virus server sends a command and control message (the C2 Message).

The point is that mapping arrival times and packet sizes along with TLS handshake information provides a pattern for detecting both good and bad data in encrypted traffic flows.

Tuning the Machine Learning Algorithm

In the example above, Cisco used the free scikit-learn software machine learning library. Written in the Python programming language, scikit-learn has a number of sponsors including INRIA, a French technology institute; New York University; Paris-Saclay Center for Data Science; Columbia University; and Google.

In simple terms, engineers can use the scikit-learn machine learning program to classify data or information. They can also use it to estimate values (regression) or to identify clusters. Cisco is using it to classify data flows as either malicious or benign.

Without going into too much detail, readers should understand that engineers can tweak and tune machine learning models, and that it takes judgment and skill to determine which tuning parameters will give the best results. The data below shows the results of Cisco's training of the scikit-learn program.

The data shown above illustrates the accuracy of the model given different data combinations. Legacy, on the left, means typical NetFlow information, such as the duration of the flow and the number of packets and bytes exchanged by each side. Legacy/SPL adds the sequence of packet lengths while TLS adds data for the TLS handshaking.

The most important data to examine are the two bottom rows, as all available data is used to train the model. The tradeoff between correctly detecting malware and predicting false positives is clear. For example, at the 0.5 value for the tuning parameter, the model correctly detects malware 99.35% of the time and benign flows 98.38% of the time. This 98.38% figure means that in 162 flows out of 10,000 (10,000 - 9,838), the model will incorrectly predict that a benign flow has malware (a false positive). When the tuning parameter is set to 0.99, the model gets the benign packets right 100% of the time, but is only 68.83% accurate in detecting malware packets. The point is that these machine learning models are rarely 100% accurate, which is the case when detecting both benign and malware-laden packets. Thus, human judgment and understanding is still required, even when artificial intelligence and machine learning are in use.

When ETA predicts malware, it does not automatically quarantine a machine from the network. Rather it raises an alarm that manual, human intervention is required to place devices under quarantine.

Continue to next page to read about the ETA product ecosystem and why the communications industry should care