Securing through Machine Learning -- Part 2
In the second part of a Q&A with Masergy's chief scientist, we take a deeper dive into machine learning's role in cyber security.
As I discussed in last week's post, machine learning (ML) can greatly assist cyber security specialists in discovering attack patterns and mitigating attacks. "ML improves cyber security by providing detection based on hidden variables that are difficult for humans to analyze," Mike Stute, Chief Scientist at Masergy, told me in an email interview.
I had reached out to Mike, who is a data scientist and the prime architect for the Unified Enterprise Security (UES) platform at Masergy, after reading a blog in which he discussed the idea of applying ML to cyber security. Last week I shared Part 1 of that email interview, providing Mike's insight on the shortcomings of manual cyber security practices, the benefits of applying ML to cyber security, and the makeup of today's cyber security attacks. Today I share the remainder of that interview, for a deeper dive on ML's role in cyber security.
How do you collect data and determine normal behavior?
Data collection is through standard techniques such as mirroring network wires to a collection system or sending logs to a log collection point. Hosts on the network can be interrogated by agents, with snapshots of the systems sent to collectors that build historical change lists. That's the easy part.
The difficult part is arranging that data to be used by a given ML technique. It would be great if we could just feed raw data into an ML data construct, but that is difficult. If in composing that data format the features necessary for good detection are dropped, the ML system will be of limited usefulness. Picking the correct ML data structure (convolutional neural network, deep belief network, stacked autoencoders, etc.) is the first decision. We have developed a new ML data structure that maps any data into a data set that can be digested by many sophisticated, well-understood ML data structures. This is patent-pending technology, so I am unable to provide more detail.
Supervised learning requires a sample data set for training in which the outcome is known, such as "viral code" or "not viral code." This type of learning can be used to solve a regression or classification problem.
Unsupervised learning is very different; it performs what is known as "clustering." It learns the features of data, and classifies any data into a given "cluster" that contains similar data. This is how behaviors are learned. Normal behavior falls into one cluster, abnormal into another, with a gradient between them. Our new data structure creates a data set that lets us present any data to an unsupervised learning system.
Can ML adapt to automated attacks?
ML can certainly adapt, and you can choose where to apply that adaptation. First, ML techniques of detection are inherently adaptable for regression and classification. Neural networks adapt through a process called evolutionary programming; a pool of randomly generated networks is scored based on its ability to solve a problem. The networks are trained first. Then networks with high scores are combined in hopes of preserving their good qualities; they replace networks with low scores. Over iterations of this system, the networks can adapt to new problems.
Adapting for detection is not as important as adapting for automated response. Detection doesn't change the state of the environment, so the same network can be reused. But a network that changes the environment must also adapt the changes made in the environment by its own modifications, so it must be adaptable. Right now few products change the environment directly; adaptability isn't built as part of the product.
What is the state of ML tools?
Many tools require decent amounts of knowledge in linear algebra, connectivity, and data science. These are the building blocks for any ML system, but these same blocks are generally already tuned for a specific purpose such as image recognition, medical classification, etc. Companies are now trying to market many generic ML tools, such as the feature learning of unsupervised ML or the classification and regression of supervised ML. Using these more generic tools requires quite a bit of ML knowledge.
I do feel this will change fairly rapidly as these tools become larger and more sample data is used to tune them. Neural networks trained in one data set are now becoming the input to others, and this is extending the knowledge of high-level networks (hence "deep" networks).
Where does Masergy fit into this equation?
Since 2001, Masergy's approach has been to combine advanced ML with big-data analytics and continuous human expert monitoring. We are on our fourth generation (not version) of detection engines. As ML progresses, so does Masergy's UES detection engine. We started with heuristics, moved through probabilistic detection, the algorithmic learning of soft artificial intelligence (AI), and are now deploying hard AI, which is the simulation of biologic thinking in computers.
True ML in a well-developed and mature product is not something you build in a few years. It requires understanding and experience. We have been doing behavioral detection using a large array of techniques on very large data sets for 15 years. Our detection engine is sophisticated and highly efficient, but along with that, we have a team of knowledgeable people that monitor our detection system 24 hours a day.