Cisco's Secret Spark
In a whitepaper published today, Cisco reveals its unique approach to giving Spark a way to search content it can't see.
Since introducing Spark, Cisco has been hinting that it is working on a new way of providing security and ensuring privacy for its cloud-based workstream messaging service. Until now, it's been hesitant to provide details while it puts the pieces in place. Today, Cisco published a whitepaper that reveals a unique approach to encrypted cloud-delivered services.
What Cisco describes is a whole new way to do cloud-based security management with the potential to end the long-established trade-off between search and privacy. Spark will be capable of searching content that it can't see.
Workstream messaging services such as Spark, Slack, and many others utilize encrypted network connections. Many people assume that this equates to encrypted data in the cloud. Unfortunately, link security only protects against network transit interception. The mother lode of data sits at rest in the cloud -- unencrypted.
Encrypting data is not difficult, but creates inconvenience and a loss of features -- the biggest being search. Content repositories are usually too large for client-side searching, so providers index the content on their servers. This has meant providers have access to data -- even confidential data -- and that data has no protection against various provider risks, including snooping, extortion, hacks, bribery, or even the handing over of corporate data in response to a subpoena.
Privacy breaches are the dark underbelly of digitization, and their potential remains one of the barriers to enterprise cloud adoption. Workstream messaging facilitates collaboration, but sensitive conversations and content becomes digital, searchable, and stored off-site.
For example, the Slack security policy states "every person and team using our service expects their data to be secure and confidential." But that data is off-site, unencrypted, and accessible to Slack. Slack's operational procedures clarify this: "The operation of the Slack services requires that some employees have access to the systems which store and process Customer Data....These employees are prohibited from using these permissions to view Customer Data unless it is necessary to do so."
These policies clarify intent, but that isn't sufficient for all enterprises. I've spoken to several companies that prohibit the use of cloud-based messaging services per their security or compliance policies. Cisco believes it has cracked the code, and can host and index content without requiring access to it.
Encrypting Stored Data
Cisco uses a modular architecture for Spark that enables it to create a portable key management system (KMS) to control and manage encryption keys. The core Spark service (and authorized Cisco employees) only have access to encrypted data. The KMS leverages modern encryption technologies with public and private keys that both protect and validate the data.
Cisco has designed the core Spark service for encrypted data. The KMS architecture is already in place within the Cisco-controlled Spark cloud. Now Cisco has separated the KMS from the rest of Spark so that it could potentially live in an enterprise's data center. As with other encrypted services, the enterprise will require a third-party certificate authority (CA). Spark can utilize any CA that the enterprise customer trusts -- including private CAs.
Spark clients encrypt content using the public key of the KMS, thus don't need their own certificates. Once the client authenticates with the Spark service, an encrypted channel is established and it can request a separate conversation key. Each Spark room has its own key shared among the room's participants. A new conversation key is generated as needed with changes in participant rights.
Cisco solved the search problem by creating the search index with encrypted data. This is accomplished in a separate module known as the Indexer. The Indexer is a security layer bot that is effectively a participant in every room.
The Indexer builds an encrypted index for each room by using a cryptographic one-way hash function to each word in the conversation thread. The output is an index of hash codes, not user data. Converting the hashed data back to the original text is not possible. An index is maintained for each room and stored in the Spark cloud.
A user submits encrypted search requests to the Spark core service. Since the Spark service cannot read encrypted data, the service forwards requests to the Indexer module. There it gets decrypted and hashed. Spark then searches for hashed terms inside a hashed database. Each room has its own hash function, and search requests are generated on a room-by-room basis.
Like the KMS, the Indexer modules live outside the core Spark service, and Cisco intends to enable either or both to optionally live in enterprise data centers. It appears that Cisco does not want actual or perceived access to customer data or encryption keys.
This explanation above is highly simplified. The whitepaper outlines many other steps and processes. For example, the Indexer utilizes a random noise generator to protect against frequency analysis breaches.
To accommodate custom development efforts, Cisco provides three levels of data access for extensibility. It offers enterprise-wide bots that can have access to each room, user applications with access to a user's rooms, and URLs for simplified access such as posting content to a room.
The architecture appears secure, but adds inefficiencies. If the data were not encrypted, only a single index would need to be maintained. Because each room is encrypted with separate keys, separate indexes are required for each room, and separate searches must be submitted for each index. If a user has access to 10 rooms, then a single search could potentially turn into 10 searches. If that single search includes two words, it becomes 20 searches.
But as a cloud service, Spark's inefficiencies are Cisco's problem -- as long as they don't result in noticeable delays. The trade-off is a robust set of services that otherwise may not be available to organizations with strict data storage requirements. Another limitation is the indexes are made-up of single words, so there's no way to restrict searches to specific strings (words in a specific order) at this time. However, I learned in speaking with Cisco, the company intends to expand search capability including support for Boolean operators.
Who Said What?
Another architectural aspect relating to Spark security is Obfuscated Identity. User identity lives in Spark's Common Identity Engine, but message routing and most of Spark's other services don't include user names, but a unique user code. Unlike email, Spark services use these identifiers instead of real names to limit exposure and increase system anonymity.
While Cisco indicated that these architectural enhancements are not attributable to the March acquisition of Synata, the company does offer conceptually similar capabilities regarding encrypted search. At the time of the acquisition, Rowan Trollope, SVP and GM, Collaboration Technology Group at Cisco, even indicated Synata was working on the problem of "searching for data you can't see." Perhaps the acquisition may have been an acqui-hire or related to patents.
These are significant enhancements to the emerging and sector of workstream messaging. This approach could open up a new era of cloud acceptance and capability. The space is growing so quickly it's hard to tell if security limitations are a factor, but it seems reasonable this could further accelerate adoption.
Dave Michels is a contributing editor and analyst at TalkingPointz.