Introducing the Avaya Real-Time Speech Snap-in

I am just as deaf as I am blind. The problems of deafness are deeper and more complex, if not more important, than those of blindness. Deafness is a much worse misfortune. For it means the loss of the most vital stimulus -- the sound of the voice that brings language, sets thoughts astir, and keeps us in the intellectual company of man. -- Helen Keller

Studies have shown that most people would rather be blind than deaf. As Helen Keller so aptly noted, deafness cuts a person off from the world. We use the sound of the human voice to form bonds, express ourselves, and feel connected to friends, family, and community. While sight is certainly important, the loss of hearing can cause us to feel isolated and alone.

Here in the world of IP communications, we attempt to strike a balance between what we can see and what we can hear. While we continue to drive interactions toward textual interfaces, we should never ignore the importance of human speech. How many of you have been in the middle of a frustrating email exchange that could only be resolved by picking up the telephone and calling? Spoken words can still be the best way to express something clearly and quickly.

Avaya certainly feels that way and is adding a little oomph to the human voice with its Real-Time Speech Snap-in, featured as part of the Avaya Engagement Solutions the company unveiled today at a briefing in Santa Clara, Calif. This clever technology performs speech analytics on active telephone calls providing contact centers with a dynamic tool that listens, parses, and performs predefined actions based on what is heard.

For example, imagine that you bought a new e-reader and are having trouble syncing the reader with the books you've purchased. You call the company's contact center and as you're explaining the problem, the Real-Time Speech Snap-in pops up a technical specification on the agent's PC and navigates to the section describing e-reader synchronization. Imagine the time that can be saved on these troubleshooting calls.

Here is another example: You run a contact center and have been tasked with increasing customer satisfaction. You can use the Snap-in to listen to what your agents say and provide feedback in real-time. For instance, it can trigger visual indicators (e.g. green checkmarks) when an agent properly greets the customer, mentions any current promotions, and thanks the customer for calling. Real-time speech analytics makes for more productive agents and happier customers.

At this point, you might be scratching your head and wondering just what this Snap-in thing is to which I keep referring.

Simply put, a Snap-in is an application that runs within the Avaya Engagement Environment (formerly called the Avaya Collaboration Environment). These Snap-ins can insert themselves into the middle of a call and do everything from play prompts to reach out to external applications. In the case of the Speech Snap-in, it can also analyze and process audio streams. Avaya provides a number of prewritten Snap-ins, but the platform also allows organizations to write their own. (For a look at another Snap-in, read Avaya Makes WebRTC as Simple as Point and Click.)

The Avaya Real-Time Speech Snap-in
The Avaya Real-Time Speech Snap-in consists of three different services:

• Query Management. Query Management is used to define the words and phrases are applied to calls. These queries can be constructed using logical operators such as "and" and "or," or they can be defined using a single phrase. For example, "Good morning, thank you for calling No Jitter" is a completely legal phrase. Complex queries can be built for more granular processing of a call. For instance, time-based logical conditions can be set that make a query only valid for the first 20 seconds of a call. You can also declare that certain phrases must be heard within a set amount of time from each other.

• Speech Search. This service applies the previously defined queries to an active call. You can apply these queries to the calling parting, the called party, or both parties. You can also tell the speech search to stop processing queries for an active call.

• Call Event Notification. This aspect of the Snap-in allows an application to be informed when an event has occurred on a call. These events consist of call answered, call ended, speech search started, speech search stopped, and speech search match. An application can choose of which events it wants to be notified. The occurrence of a subscribed event will then trigger a callback function within the application. The application will be informed of which call this event applies to with a Unique Call Identifier (UCID).

• Query Management. Query Management is used to define the words and phrases are applied to calls. These queries can be constructed using logical operators such as "and" and "or," or they can be defined using a single phrase. For example, "Good morning, thank you for calling No Jitter" is a completely legal phrase. Complex queries can be built for more granular processing of a call. For instance, time-based logical conditions can be set that make a query only valid for the first 20 seconds of a call. You can also declare that certain phrases must be heard within a set amount of time from each other.

• Speech Search. This service applies the previously defined queries to an active call. You can apply these queries to the calling parting, the called party, or both parties. You can also tell the speech search to stop processing queries for an active call.

• Call Event Notification. This aspect of the Snap-in allows an application to be informed when an event has occurred on a call. These events consist of call answered, call ended, speech search started, speech search stopped, and speech search match. An application can choose of which events it wants to be notified. The occurrence of a subscribed event will then trigger a callback function within the application. The application will be informed of which call this event applies to with a Unique Call Identifier (UCID).

Web Services
The previously mentioned components are all RESTful Web services. RESTful Web services provide HTTPS (Secure Hyper Text Transfer Protocol) interfaces that create loose connections between a service and an application. An application that wishes to communicate with a RESTful Web service will send HTTP PUT, GET, POST, and DELETE commands.

If that sounds like a whole lot of mumbo jumbo to you, think of it like this: A RESTful Web service is sort of like a page on a Web server. An application acts like a Web browser that communicates to the service with HTTP commands. The Web service will perform the designated action and communicate any results back to the application with HTTP commands.

This way of programming allows applications and Web services to exist completely independent of each other. They securely share information through standard Web technologies.

Putting It All Together
I mentioned a few examples of how the Real-Time Speech Snap-in might be used, but the sky is the limit as to how an enterprise might apply it to operations. Beyond contact center agent and customer interactions, the following use cases are possible:

• Supervisor Alerting: The Real-Time Speech Snap-in can inform a contact center supervisor if an agent has deviated from a call script or is beginning to suffer from burnout or frustration.

• Context Creation: As a call is in progress, the audio can be identified, tagged, and stored as part of the context for this customer or transaction. This data can be used later in call classification and tracking, and can also be shared across teams to create consistent, high-quality customer experiences.

• Worker Compliance: Real-time speech analytics allows a company to identify problems and quickly intervene when policy or regulatory breaches are detected.

• Supervisor Alerting: The Real-Time Speech Snap-in can inform a contact center supervisor if an agent has deviated from a call script or is beginning to suffer from burnout or frustration.

• Context Creation: As a call is in progress, the audio can be identified, tagged, and stored as part of the context for this customer or transaction. This data can be used later in call classification and tracking, and can also be shared across teams to create consistent, high-quality customer experiences.

• Worker Compliance: Real-time speech analytics allows a company to identify problems and quickly intervene when policy or regulatory breaches are detected.

Wrapping Things Up
In a nutshell, the Avaya Real-Time Speech Snap-in has these three goals:

1. Increase customer satisfaction
2. Help companies adhere to regulations and policies
3. Help companies keep their workers productive and happy

The way in which we communicate is in flux. Where once we only called, now we text, email and chat. However, despite the rise of these text-based interactions, the human voice remains one of the most effective ways to get a message across. Real-time speech analytics takes that effectiveness to even higher levels.

Andrew Prokop writes about all things unified communications on his popular blog, SIP Adventures.

Follow Andrew Prokop on Twitter and LinkedIn!

@ajprokop